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Preface 


This Big Five Assessment volume bridges a gap in the personality literature and 
represents a unique opportunity for students, researchers, and practitioners to find 
just about everything concerning the evaluation of the Big Five within the cover of a 
single book. The volume aims to present a comprehensive account of the various 
instruments that are constructed to assess the Big Five factors, often named (I) Ex- 
traversion, (II) Agreeableness, (Ш) Conscientiousness, (IV) Emotional Stability or 
Neuroticism, and (V) Intellect, Intellectual Autonomy, ог Openness to Experience. 
Both well-known and frequently used and less well-known or less frequently used 
inventories, questionnaires, and trait adjective lists are described. During the last 
decade or so, the field of personality assessment has rapidly changed its orientation 
towards a full and extensive account of the constructs and sub-constructs that are 
captured by the Big Five model. One of the persistent issues in the literature on the 
Big Five factors is given in the great many forms each of the factors can take, even 
where the names of the factors are the same. Extraversion, for example, may some- 
times take a more expressive connotation, and at other times a more energetic or 
even aggressive connotation. Factors from different studies may allow for two, 
three, four, five, or more facets. The formats of the original items may be abstract 
and adjectival or more behavioral. АП of these issues come about most explicitly in 
the development of assessment instruments. This volume not just offers a view on 
the different instruments, their uses, and their applications, but also on the struggles 
researchers may go through in striving at operationalizing their conceptualizations of 
the Big Five constructs. 

This book has been preceded by a few others that had the Big Five model as a 
main theme, such as Costa and Widiger's (1994) volume dedicated to personality 
disorders, Halverson, Kohnstamm, and Martin's (1994) volume dedicated to devel- 
opmental aspects of the five-factorial model, Wiggins' (1996) volume on theoretical 
aspects of the Five-Factor Model, and De Raad's (2000) monograph on the psy- 
cholexical approach to personality. The latter book had a restricted focus on largely 
historical, procedural, and theoretical considerations that gave rise to the present day 
formulation of the Big Five model. That book was originally intended to have an 
assessment chapter with quite an extensive scope, but that plan soon outgrew its 
original frame. That chapter was dropped, which action was immediately superseded 
by a much more pretentious plan, namely to review all Big Five assessment instru- 
ments in а separate handbook. That plan is now virtually realized. This Big Five 
Assessment book contains 20 chapters with descriptions of at least 18 different in- 
struments that can be used to assess the Big Five factors in a variety of contexts – 
organizational, clinical, developmental, research. The mere conjoining of instru- 
ments that are competitors in the commercial world is an interesting fact in itself, 
which does not mean that the contributors perform a concert in close harmony. On 
the contrary, opposing views on various issues are spelled out in different chapters, 
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which fact witnesses the liveliness of the field: it signals an approach at work and it 
testifies that there are still unresolved aspects. 

This book is the product of many individuals. Thanks are due to the contributors 
who made this book to a real handbook of Big Five assessment. We would like to 
acknowledge those who served as manuscript reviewers, in particular Alois 
Angleitner, Filip De Fruyt, Jolijn Hendriks, Wim Hofstee, Karen Van Oudenhoven- 
Van der Zee, and Fons Van de Vijver. Special thanks should go to Hanny Baan who 
processed the text to the present form from the first to the last page. 


Boele De Raad 
Marco Perugini 
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Chapter 1 


Big Five factor assessment: Introduction 


Boele De Raad 
Marco Perugini 


Introduction 


The Big Five model of personality traits derives its strength from two lines of re- 
search, the psycholexical and the questionnaire tradition (John & Srivastava, 1999; 
McCrae & John, 1992). While the names Big Five model and Five Factor Model are 
often used interchangeably, they respectively originate in those two traditions. The 
two traditions have produced similar five-factor structures that mark a point of no 
return for personality psychology. Extensive reviews of history and theory with res- 
pect to the Big Five can be found in De Raad (2000) and Wiggins (1996). 

The Big Five factors have been endorsed with a distinctive status, derived from 
the extensive, omnibus-character of the underlying psycholexical approach, and 
based on two characteristics, namely its exhaustiveness in capturing the semantics of 
personality and its recourse to ordinary language. Though both these characteristics 
may be improved upon, in comparison to other approaches to personality, the psy- 
cholexical approach outranks regarding semantic coverage, and it has optimized the 
level of communication on personality traits by faring merely on readily intelligible 
units of description. 

Of the Big Five factors, Extraversion and Neuroticism had been identified as the 
"Big Two" by Wiggins (1968) within the questionnaire approach because these two 
dimensions had shown up in most personality questionnaires. Costa and McCrae 
(1976, 1985) added a third dimension, Openness to Experience, with which they 
touched a large audience, but with their addition of Agreeableness and Conscien- 
tiousness the questionnaire approach "cashed in", so to speak, more fully the fruits 
of the psycholexical approach. Since, the questionnaire approach has been tuned to- 
wards the coverage of especially those five dimensions. Several studies (e.g., An- 
gleitner & Ostendorf, 1994) have been supportive of a Big Five factor structure 
among scales from different instruments (see, e.g., John & Srivastava (1999) for a 
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portrayal of the convergence between the Big Five and other structural models). 
With the emphasis on full coverage of the trait domain, and thus on a representation 
of the lexicon of trait-descriptive items, also the questionnaire approach essentially 
turned lexical. 

The Big Five model has served as a basis for the development of assessment in- 
struments of various kinds. In paragraphs to follow, different assessment forms ba- 
sed on the Big Five model are briefly described, including Big Five trait-markers, 
Big Five inventories, and some instruments that have been shaped after the Big Five 
framework. Different examples of each of those forms can be found in the various 
chapters of this book. These brief descriptions are preceded by a discussion of the 
different uses of the Big Five model, and by a description of the Big Five constructs 
and their relevance. 


Uses of the Big Five model 


With reference to the ordinary language character of the psycholexical approach and 
of the Big Five model, one interesting use of the model is that it may serve as a 
standard medium of communication, in terms of which other psychological- 
technical concepts can be expressed. For example, Strelau's Strength of Excitation 
(SE; Strelau, Angleitner, Bantelmann & Ruch, 1990) correlates substantially with 
Big Five factors Extraversion and Emotional Stability, and Eysenck's Psychoticism 
(P; Eysenck, 1992) factor correlates substantially (negatively) with Big Five 
Agreeableness and Conscientiousness. Because of those relationships, the Big Five 
factors give insight in the semantics of Strength of Excitation and Psychoticism. 

Apart from this, and especially because of its distinctive status, the Big Five mo- 
del can function, and has done so, as a template, suggesting a general function of the 
model as a reference-framework, underlying various uses. The model can thus serve 
as a reference system for other systems, suggesting that with the Big Five factors 
one can find a peg for almost every hole. 

While the Big Five approach may boast this distinctive status, the Big Five fac- 
tors form also a model of personality-traits next to other models. In that capacity, it 
contains a well-founded set of basic concepts with which traits of persons can be 
described at an abstract level. At that abstract level, the model has a certain range of 
coverage by which it can compete with other models. In addition, the model can ser- 
ve as a basis for the development of assessment instruments. Also in this respect the 
psycholexical approach provides for a good starting-point to compete with other 
systems of assessment. The circumplex representation, especially in its Abridged 
Big Five Circumplex (ABSC) format (De Raad, Hendriks, & Hofstee, 1992; 
Hofstee, De Raad, & Goldberg, 1992), with its systematic and detailed stratification 
of the semantic domain, gives the psycholexical approach the extra dimension that 
sustains its competitive status. Moreover, because alternative methods to arrive at 
domain-covering assessment-instruments are restricted by theory or problem- 
orientation, the psycholexical approach justifiably claims its exceptional position, 
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which position, obviously, ultimately can be maintained only on the basis of empiri- 
cal results. 

In addition to developing assessment instruments, some other uses of the Big 
Five model and of the ABSC-system are distinguished. These are the use of the 
ABSC-system and of the Big Five model to classify various kinds of personality de- 
scriptors, the use of the ABSC-system to clarify discrepancies in interpretation of 
Big Five factors (e.g., Johnson & Ostendorf, 1993), and the role of the Big Five mo- 
del in theory-building. These various uses of the Big Five not only sustain the tem- 
plate function of the model, but also illustrate its relevance. 

The different uses and functions of the Big Five model provided here do not ex- 
haust the possible roles the model may fulfil; they are given to showcase the poten- 
tial of the model in many different directions. The Big Five model is used in many 
different types of investigations, running from the judgments of faces (Henss, 1995), 
to the comparison of polar workers with a normative population (Steel, Suedfeld, 
Реп, & Palinkas, 1997), to the construct validation of the concept of ‘argumentati- 
veness' (Blickle, 1997). 


The Big Five as a classification-system 


We give two examples that showcase the Big Five framework as organizer of se- 
mantic material collected for the purpose of describing individual differences. In 
both cases, the semantic material had not been collected from the Big Five view- 
point. Yet, it turned out that a description of the semantics of individual differences 
in those studies did not need important categories beyond the confines of the Big 
Five. 

Assuming that the ABSC-framework provides a fair and full map of the seman- 
tics of personality traits, De Raad and Doddema-Winsemius (1999) used that system 
to accommodate a list of 323 instincts. That list, being a reduction of a larger list of 
985 instinct-references, had been put together according to a psycholexical pattern in 
which hundreds of books, articles, and newspapers were carefully examined for in- 
stinct-expressions, many of which turned out to be descriptors of individual differ- 
ences in personality (Bernard, 1924). 238 instinct-terms could be classified, leaving 
a set of 85 terms that could not be placed mainly because the instinct-terms were too 
ambiguous in meaning, or too specific. The majority of instinct-terms could thus be 
classified into the five categories corresponding to the Big Five. The results thus 
support the power of the inclusive character of the semantics of the Big Five frame- 
work. 

Kohnstamm, Halverson, Mervielde, and Havill (1998) used the Big Five catego- 
ries to classify free descriptions of children provided by parents (cf. Havill, Allen, 
Halverson, & Kohnstamm, 1994; Kohnstamm, Slotboom, & Elphick, 1993). Kohn- 
stamm et al. (1998) summarized studies performed in seven different countries, each 
study revolving around the question put to parents: "Can you tell me what you think 
is characteristic of your child". Each study produced a host of words and phrases to 
be considered as representing the cultural lexicon of personality. In order to classify 
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the many thousands of descriptors, 14 categories were used, five of which formed by 
the Big Five labels. The vast majority of expressions were accommodated by the Big 
Five, which provides an overwhelming support for the inclusive power of the Big 
Five categories. The other categories that were considered necessary for classificati- 
on caught less than 20 per cent of the expressions on average. 


Role of Big Five in theory building 


A theory is a set of conventions that enable to represent empirical facts in an organi- 
zing and integrating scheme. With such a simplified and abstract form, also the Big 
Five scheme forms a theory that allows the incorporation of empirical findings. Its 
utility is in its descriptive scope. The inclusive potential of the Big Five system has 
been demonstrated in the preceding paragraphs. The structural features of the Big 
Five system, with horizontal and vertical aspects, and with the periodicity or cir- 
cumplexity of the ABSC representation, provide ample opportunity for the clarifica- 
tion and scrutiny of the interrelationships of the many trait concepts that belong to 
the personality trait domain. 

A very instructive example in this respect is the study by Digman (1997), who 
analyzed factor correlations from 14 studies. In all of these studies five primary 
factors had been produced, identified as the Big Five. The 14 sets of correlations 
between those Big Five factors were factored to produce higher-order factors. In all 
of the 14 studies two factors were typically evident, initially labeled alpha and beta. 
Factor alpha was indicated by Big Five factors Agreeableness, Emotional Stability, 
and Conscientiousness, and factor beta was indicated by Extraversion and Intellect 
or Openness to Experience. 

Apart from the hierarchical organization of the trait constructs, there is the possi- 
bility, as suggested by Digman (1997), that these two higher-order factors link the 
Big Five to various theoretical systems of personality. Factor alpha might be a social 
desirability factor; alternatively the factor may represent the socialization process or 
represent what personality development is about. Factor beta may be conceived of as 
a personal growth factor, reflecting the actualization of self, being open to experien- 
ce, and using one's intellect. 

А different approach towards assessing relationships among Big Five factors is 
followed by Hofstee (2001). Because the large majority of social desirable traits 
form a positive manifold, and their undesirable opposites form a negative manifold, 
it makes sense, parallel to conceptualizations of intelligence. to speak of a general 
personality factor, called the p-factor. Hofstee proposes this single personality factor 
as the top trait in a sophisticated hierarchy, the meaning of which might be social 
desirability, or more probably, a construct that may stand midway between compe- 
tence and coping. 

Also at the more specific level of this hierarchical model progress is made. 
Peabody and Goldberg (1989) suggested that each of the Big Five factors might 
have its own realm of application; for example, Conscientiousness was suggested to 
have particular relevance in the realm of work and tasks. Such information is im- 
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portant for the Big Five in order to arrive at a full bearing in everyday and professi- 
onal contexts. Ten Berge and De Raad (2001, in press) report about a first situational 
specification of traits from the Big Five domain. They not only developed a taxo- 
nomy of situations that is particularly relevant for trait psychology; they also found 
clear indications that the so-called temperament factors, Extraversion, Emotional 
Stability, and also factor V, Intellectual Autonomy, give rise to more situational dif- 
ferentiation than the so-called character-factors, Agreeableness and Conscientious- 
ness (Ten Berge & De Raad, in press). 


The Big Five as organizer in fields of experience and research 


The Big Five factors have become acknowledged or re-acknowledged as relevant 
and valid dimensions of personality in various fields of research. Barrick and Mount 
(1991), for example, who investigated the role of the Big Five factors relative to job 
performance, judged the availability of an orderly classification scheme like the Big 
Five as essential for the communication and accumulation of empirical findings, in 
any field of science. Their meta-analysis of the relation of the Big Five to job per- 
formance criteria for five occupational groups showed Conscientiousness to be a 
valid predictor for all groups, and Extraversion and Openness to Experience to play 
a more restricted role. 

Another example is Smith and Williams (1992) who expected the Big Five model 
to facilitate progress in the study of how personality influences health. They revie- 
wed literature concerning the question whether personality is causally related to 
physical illness, and actually investigated whether traits from the Big Five have been 
specifically involved. Smith and Williams (1992) concluded that a more coherent 
conceptual and empirical foundation for the study of personality and health would 
likely occur from efforts to apply the five-factor model. 

De Raad and Schouwenburg (1996) reviewed the literature on personality in lear- 
ning and education. They exploited the Big Five model as a reference system for the 
evaluation of the comprehensiveness of the literature reviewed, and in particular 
they organized the literature on personality, learning and education using the perio- 
dic system of the Big Five factors and facets (AB5C) as an accommodative frame- 
work. 

Many more examples of exploitation of the Big Five framework can be mentio- 
ned, such as in behavior genetics where the Big Five factors have been taken to clas- 
sify behavior genetic findings with respect to adult personality (e.g., Bouchard, 
1993). In psychotherapy the Big Five model can be utilized to facilitate psychothe- 
rapy treatment (e.g., Miller, 1991). Van Dam (1996) shows that the Big Five model 
provides for a useful framework to understand the ways selectors perceive the per- 
sonalities of job applicants. 

Finally, Buss (1991) embraced the Big Five factors as the most important dimen- 
sions of the ‘social landscape’ to which humans had to adapt: they are considered to 
be the dimensions along which people act upon differences in others, which is, from 
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an evolutionary perspective, crucial for solving problems of survival and reproducti- 
on (cf. Buss, 1996). 


The Big Five constructs 


The Big Five constructs, Extraversion, Agreeableness, Conscientiousness, Emotio- 
nal Stability, and Intellect/Autonomy, made a long journey, covering about a whole 
century, towards a strong performance in the psychological arena during the last de- 
cade of the twentieth century. A count of the references made to each of the pre- 
sently identified Big Five constructs in abstracts during this century provides an in- 
teresting picture of the appreciation of the pertinent constructs. Of the total number 
of 17,262 references found in the relevant abstracts, made available on CD-rom, ac- 
credits Extraversion (and Introversion) and Neuroticism (and Emotional Stability) as 
absolute winners with totals of about 8,500 and 6,200, respectively. This picture 
sustains the historical “Big Two" of temperament (Wiggins, 1968). The historical 
third, Intellect, with about 1,500 references, may refer to both traits and abilities. 
Agreeableness and Conscientiousness started playing a role of importance only since 
the last one or two decades. The counts for those constructs are around 500. 


Extraversion and Introversion 


Guilford and Braly (1930) already remarked that “No single pair of traits of perso- 
nality has been quite so widely discussed and studied as that of extroversion and in- 
troversion" (p. 96). Their main understanding at the onset of their appearance was 
Jungian. Though there are references in the literature to these traits before Jung, their 
main understanding at the onset of their appearance was Jungian. To Jung (1917) 
Extraversion is the outward turning of psychic energy toward the external world, 
while Introversion refers to the inward flow of psychic energy towards the depths of 
the psyche. Extraversion is denoted by habitual outgoingness. venturing forth with 
careless confidence into the unknown, and being particularly interested in people 
and events in the external world. Introversion is reflected by a keen interest in one's 
own psyche, and often preferring to be alone. 

Extraversion is a dimension in almost all personality inventories of a multidimen- 
sional nature, which fact sustains its relevance and its substantive character. More- 
over, many studies have provided behavioral correlates of this construct (e.g., 
Watson & Clark, 1997), such as the number of leadership roles assumed, and fre- 
quency of partying, and also nonverbal decoding skills in social interaction, but only 
when this is a secondary task (Lieberman & Rosenthal, 2001). Mak and Tran (2001) 
provided evidence of the relevance of Extraversion for intercultural social self- 
efficacy. Extraversion has also been found to predict employees’ absenteeism 
(Judge, Martocchio & Thoresen, 1997), the use of networking as a job-search me- 
thod among unemployed (Wanberg, Kanfer, & Banas, 2000), and the objective sales 
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volume and managerial ratings of salesperson performance (Vinchur, Schippmann, 
Switzer, & Roth, 1998). Extraversion has been found to be positively related to level 
of salary and promotions (Seibert & Kraimer, 2001). Extraversion is also very rele- 
vant in contexts of learning and education (De Raad & Schouwenburg, 1996), and 
the construct appeared to be related to various health-related behaviors (Scheier & 
Carver, 1987). For example, it predicts subjective well-being (DeNeve & Cooper, 
1998), at midlife (Siegler & Brummett, 2000) as well as for centenarians (Adkins, 
Martin, & Poon, 1996); and it is negatively associated with avoidance coping in 
dealing with cardiac catheterisation (Bosworth, Feaganes, Vitaliano, Mark, & Sieg- 
ler, 2001). 


Agreeableness 


Agreeableness is the personality dimension with the briefest history. This may come 
as a surprise, since longtime constructs as Love and Hate, Solidarity, Conflict, Co- 
operation, Kindness are part and parcel of this dimension. While those different con- 
structs may have been pivotal to the organization of social life throughout the his- 
tory of mankind, as a personality dimension it essentially popped up with the rise of 
the Big Five. Graziano and colleagues have described the details of the history of 
this construct (e.g., Graziano & Eisenberg, 1997). The Agreeableness dimension is 
probably the most concerned with interpersonal relationships. Wiggins (1991) theo- 
rizes about Agreeableness as being dominated by ‘communion’, which is the condi- 
tion of being part of a larger spiritual or social community (cf. Hogan, 1983). 

In the interpersonal domain there are several correlates of Agreeableness, inclu- 
ding more elevated ratings of peer performance on group exercises (Bernardin, 
Cooke, & Villanova, 2000), interpersonal skills in teams (Neuman & Wright, 1999), 
and several aspects of social relationships between university students (Asendorpf & 
Wilpers, 1998). Agreeable persons select tactics that minimize disruption during 
conflict episodes, and they continue to talk more with their conflict partners after a 
conflict (Jensen-Campbell & Graziano, 2001). McCullough, Bellah, Kilpatrick, and 
Johnson (2001) found that Agreeableness was negatively related to vengefulness. In 
health psychological research, Agreeableness plays also a documented role. For in- 
stance, coronary heart disease is more likely to develop in competitive and hostile 
people than in those who are more easygoing and patient (cf. Dembroski & Costa, 
1987; Graziano & Eisenberg, 1997). Moreover, agreeable people are less likely to 
engage in risky health behaviours and are more optimistic about their future health 
risks (Vollrath, Knoch, & Cassano, 1999). While Tett, Jackson and Rothstein (1991) 
conclude that personality measures in general have a place in the personnel selection 
research (cf. Barrick & Mount, 1991), several studies support the specific role of 
Agreeableness, for instance, as a predictor of training proficiency (e.g., Salgado, 
1997). 


8 Big Five Assessment 


Conscientiousness 


Conscientiousness has been drawn upon as a resource in situations where achieve- 
ment is an important value, that is, in contexts of work, learning and education. The 
construct represents the drive to accomplish something, and it contains the characte- 
ristics necessary in such a pursuit: being organized, systematic, efficient, practical, 
and steady (cf. Goldberg, 1992). 

There is an impressive list of studies emphasizing the importance of conscien- 
tiousness and related facets in learning and education. Successful boys at grammar 
school received higher ratings in persistence than unsuccessful boys (e.g. Astington, 
1960). Smith (1967) found ‘strength of character’to be an important nonintellective 
correlate of academic success. Wiggins, Blackburn, and Hackman (1969) found con- 
scientiousness among the best predictors of graduate success. Conscientiousness 
plays a prominent role in the HSPQ (Cattell, Cattell, & Johns, 1984) as predictor of 
school grades (e.g. Schuerger & Kuna, 1987). Recently, Wolfe and Johnson (1995) 
have supported the role of conscientiousness in prediction of school performance, 
whereas in a longitudinal study Shiner (2000) has shown that academic conscien- 
tiousness in childhood is predictive of both academic achievement and conduct ten 
years later. In organizational settings, reviews provided by Barrick and Mount 
(1991), Tett, Jackson, and Rothstein (1991), Ones, Viwesvaran, and Schmidt (1993), 
and Salgado (1997), have led to the conclusion that Conscientiousness is consis- 
tently related to job performance criteria (cf. Hogan & Ones, 1997). In health beha- 
vior research, Conscientiousness has been shown to play an important role in pre- 
dicting a range of important outcomes such as longevity (Friedman, Tucker, 
Schwartz, Martin ег al., 1995), smoking (Hampson, Andrews, Barckley, Lichten- 
stein, & Lee, 2000), mammography utilization (Schwartz, Taylor, Willard, Siegel, 
Lamdan, & Moran, 1999), physical fitness (Hogan, 1989), and lower risky health 
behavior (Lemos-Giraldez & Fidalgo-Aliste, 1997; Vollrath, Knoch, & Cassano, 
1999). In the area of antisocial behavior, Conscientiousness plays a role as well. 
Heaven (1996) reported Conscientiousness to be negatively related to vandalism, 
and Clower and Bothwell (2001) found Conscientiousness to be negatively related 
to inmate recidivism. 


Emotional Stability and Neuroticism 


The first inventory measuring neurotic tendencies is Woodworth's (1917) Personal 
Data Sheet, developed during World War I to assess the ability of soldiers to cope 
with military stresses. Being emotionally stable has bcen considered a requirement 
in such demanding situations as there are in the airforce (cf. Henmon, 1919) and the 
police force (cf. Graf, 1924). Thurstone and Thurstone (1930) developed a neurotic 
inventory called "A Personality Schedule" to assess the neurotic tendencies of uni- 
versity freshmen. Their inventory was in part based on Woodworth's (1917) instru- 
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ment. Аз one of the "Big Two", Neuroticism, often also referred to as Anxiety, had 
been observed by Wiggins (1968) most notably in the works of Eysenck (1957), 
Cattell (1957), Guilford (1959), and Gough (1957). 

Neuroticism ћаз been found relevant as a predictor of school attainment бер. , 
Entwistle & Cunningham, 1968; Eysenck & Cookson, 1969). At a university level, 
high neurotic students are probably handicapped as compared to low neurotics. In 
the organizational context, Emotional stability turned out to be a good predictor of 
job performance and job satisfaction (Judge & Bono, 2001) and of higher status in 
social groups (Anderson, John, Keltner, & Kring, 2001). In the social context, Neu- 
roticism has been found detrimental to commitment in a relationship (Kurdek, 1997) 
and to the level of marital satisfaction (Karney & Bradbury, 1997). McCullough er 
al. (2001) found that Neuroticism was positively related to vengefulness. In the cli- 
nical situation, there is strong evidence of the relevance of Neuroticism in the asses- 
sment of personality disorders (cf. Schroeder, Wormworth, & Livesley, 1992). Neu- 
roticism correlates significantly with various measures of illness (cf. Costa & 
McCrae, 1987; Friedman & Booth-Kewley, 1987). There is evidence that Neuroti- 
cism is involved in processes described in illness behavior models (e.g. Larsen, 
1992). It is a strong predictor of psychological distress (Ormel & Wohlfartf, 1991), 
predicts both positive and negative mood (David, Green, Martin, & Suls, 1997), and 
it is associated with higher interest in social comparison and with less favorable re- 
actions in cancer patents (Van der Zee, Oldersma, Buunk, & Bos, 1998). 


Intellect and Openness to Experience 


Feelings are usually running highest for the Fifth of the Big Five (De Raad & Van 
Heck, 1994). This refers to its naming but also to its origin and its relevance as a 
personality trait factor. In a sense, discussions with respect to this factor incorporate 
the various points of criticism that are expressed over the Big Five as a model. 
While several candidates for factor five have been suggested, including Culture 
(Tupes & Christal, 1961: Norman, 1963), Intelligence (Borgatta, 1964), Intellectan- 
ce (Hogan, 1983), and Imagination (Saucier, 1994), the main dispute over the Fifth 
of the Big Five concerned the lexical versions of factor five, on the one hand, and on 
the other hand, Openness to Experience as developed for the NEO-PI (Costa & 
McCrae, 1985). 

In assessment situations it has been mainly the Openness to Experience concepti- 
on that established the relevance and significance of the Fifth of the Big Five. This 
factor may be relevant in psychiatry and clinical psychology. Aspects of Openness 
to Experience seem to be related to several disorders (Costa & Widiger, 1994) and to 
high-risk health behavior (Booth-Kewley & Vickers, 1994). In the more task- 
oriented contexts the relevance of the Fifth has also been pointed out. In contexts of 
learning and education, Openness to Experience has been related to learning strate- 
gies. Learning strategies possibly mediate a relationship between Openness to Expe- 
rience and grade point average (cf. Blickle, 1996). In organizational settings, Open- 
ness to Experience has been associated with increased creative behavior (George & 
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Zhou, 2001) and job performance (Bing & Lounsbury, 2000; Dollinger & Orf, 1991; 
Salgado, 1997), and it was negatively related to level of salary (Seibert & Kraimer, 
2001). Mak and Tran (2001) provided evidence of the relevance of Openness to Ex- 
perience for intercultural social self-efficacy. Clower and Bothwell (2001) found 
low Openness to Experience to be related to inmate recidivism. 


Facets of the Big Five 


The Big Five factors represent a broad level of personality structure, in which gene- 
rality is emphasized at the cost of specificity. There is no guarantee that those broad 
factors do exhaust all significant personality dimensions. While there is evidence 
that some specific personality dimensions can be conveniently accommodated wit- 
hin the Big Five framework, and others can be understood as facets of the Big Five 
or as combinations of them (Johnson & Ostendorf, 1993), there is discussion over 
whether some other specific dimensions (e.g., honesty, reciprocity, morality) may be 
placed within the Big Five framework (see Paunonen & Jackson, 2000; Saucier & 
Goldberg, 1998). 

Two main approaches, the so called hierarchical and the circumplex approaches, 
have been proposed to model the different levels of specificity versus generality, in 
particular in terms of the distinction between facets and factors (Perugini, 1999). 
Both approaches recognize that the Big Five factors are better thought of as broad 
personality dimensions subsuming several more specific dimensions. These specific 
dimensions or facets can be either considered as hierarchically nested in the Big Five 
or as blends of the Big Five. 

Under the hierarchical approach facets are often considered as first order factors, 
and the Big Five as second order factors. The NEO-PI-R. for example, specifies six 
facets for each of the five factors, and the BFQ specifies two facets per factor. Even 
though facets can have some secondary and tertiary loadings, they are supposed to 
load primarily on a specific Big Five factor and, as such, to represent the specific 
personality traits that form the core of the more general Big Five factor. 

The circumplex approach represents a finer-grained configuration distinguishing 
90 segments in the so-called Abridged Big Five Circumplex (АВ5С; Hofstee et al., 
1992). In this model, facets are constituted as blends of two factors, based on the 
observation that many traits are most adequately described by two substantial loa- 
dings instead of just one. The 90 segments include 10 segments each containing 
traits that load on only one of the factor poles, and 80 segments containing the 
blends of the factor poles with all the poles of the other four factors. The flavor of 
each separate Big Five factor is given by both the two factor-pure facets and by the 
16 facets based on blends. Because of its explicit representation of the trait domain, 
this circumplex model provides an excellent starting point for the development of 
personality assessment instruments. 

Notwithstanding the differences in orientation and emphasis, under both approa- 
ches a distinction is drawn between Big Five factors and their facets. One intriguing 
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question is therefore whether there is any utility in assessing both. This question is 
considered here from three perspectives: theoretical, structural, and predictive. 

Theoretically, the role of facets is very important. The specific spectrum of each 
Big Five factor is not easily conveyed at the broader (factor) level. While definitions 
of each of the five factors are readily available and much research has been pro- 
duced on linking those factors to an impressive range of criteria, knowledge of the 
specific facets of each factor allows for a finer-grained understanding of the perso- 
nality lexicon and such understanding facilitates criteria-linkage. As argued by Sau- 
cier and Ostendorf (1999), а representation combining broad and narrow constructs 
offers a good compromise between efficiency (or parsimony) and fidelity. More- 
Over, the theoretical structure is much clearer and likely to contribute to an increased 
understanding of the functioning of personality dimensions. In fact, although the 
psychological mechanisms linked to a broad personality dimension can be also spe- 
cified at a general level, it is at the facet level that this specification can be accom- 
plished best. For example, self-regulation is central to Conscientiousness, but it is 
likely to be especially important for facets such as Organization rather than Perfecti- 
onism. Furthermore, a specification of both facets and broader factors allows for a 
better understanding of the dynamics of a personality profile: except for people with 
extreme scores, usually there 15 substantial variation in the contribution of each facet 
of a factor to the overall factor score. 

Structurally, there 15 hardly any difference in the overall Big Five structure when 
adding facets to the factors. The advantage is that adding facets provides for a strati- 
fication of the universe of traits, which can be used to develop assessment instru- 
ments that are balanced as regards representative sampling of items from this uni- 
verse (cf. De Raad & Hendriks, 1997). The Five Factor Personality Inventory, for 
example, makes use of this property in order to provide a Big Five structure that is 
balanced as regards the representation of both pure factors and blends. 

At the predictive level, the situation is less clear-cut. Some researchers have ar- 
gued that the assessment of broad factors is preferable over narrower facets, especi- 
ally in situations where the criteria to be predicted are also broad and complex (Ones 
& Viswesvaran, 1996). Others have argued that the specific variance accounted for 
by narrower facets possibly increases the prediction of relevant criteria, even when 
those criteria are broad and complex (Paunonen, 1998; Paunonen & Ashton, 2001). 
Facets can uniquely contribute because they may contain reliable variance that is not 
shared with the broader trait to which they belong but that is shared with some rele- 
vant criteria. It is just an empirical matter whether this addition would be useful. 
Paunonen and Ashton (2001) have produced evidence that this is indeed the case. 
They compared broad Big Five factors with their facets in terms of predictive power 
across a broad range of behavioral criteria. The results clearly demonstrated a pre- 
dictive gain with the addition of carefully selected facets to broad factors. 

It may be wise for researchers to consider using facets in addition to the Big Five 
factors. The potential benefit of using facets often outweighs the costs, and the theo- 
retical and practical gains seem sufficient enough to justify in most cases such a 
more demanding choice. When feasible, researchers might pay specific attention to 
the lower-level features of the Big Five and select those that would be most approp- 
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riate for the given research question. The attention does not need to be unduly re- 
stricted to the Big Five dimensions: there are research contexts where other specific 
dimensions could be relevant. However, a measure of the Big Five should be routi- 
nely used, as this would guarantee scientific progress and cumulative knowledge. 


Trait assessment instruments 


There are some obvious and striking differences between the Big Five and related 
assessment instruments that are generally of interest. One of the pertinent characte- 
ristics is the number of items and the amount of time necessary to fill them out. 
Since this information is not always specified, we generalize the average on the ba- 
sis of the known figures, yielding about six to seven items filled out per minute. This 
would, for example, be about ten minutes for the brief 50-item BFMS or the 60-item 
nonverbal FF-NPQ, 15 minutes for the FFPI, 20 minutes for the HiPIC, 25 minutes 
for the TPQue, 35 minutes for the NEO-PI-R, and 45 minutes for the GPI and the 
ACL. The average internal consistency of the Big Five and related scales across the 
instruments in this volume is just over .80. In general there are somewhat higher 
values with increasing number of items, and lower values with decreasing number of 
items, with favorable exceptions, for example, for the FF-NPQ and the FFPI. 
Reasons for choosing a certain instrument may depend on various considerations, 
such as amount of time available, costs of the instrument, difficulty of item- 
formulation, context of use, validity-information, etc. In the following paragraph, 
some characteristic features of the various instruments in this volume are reviewed, 
without trying to provide a systematic overview. The readers are urged to go to the 
relevant and to make their own comparison. 


Bie Five trait-markers 


Possibly the most direct way to arrive at an instrument assessing the Big Five is to 
select trait-variables as markers of the Big Five, on the basis of their loadings on 
those factors. This could yield assessment instruments comparable to that of Norman 
(1963). Simply taking the first п highest loading trait-variables per factor might do 
the job. A frequently used marker list to measure the Big Five is the one described in 
Norman (1963). The list is based on earlier work by Cattell (1947). Other lists have 
been often produced either as a stand-alone effort or as a by-product of psycholexi- 
cal studies. For the history of this and of similar constructs from the same period, as 
well as for a comprehensive coverage of many psycholexical studies, see De Raad 
(2000). 

Saucier and Goldberg (this volume) review ten potential criteria that might be 
used to select markers. They indeed emphasize criteria consistent with the factor 
analytic strategy of selecting iterns with high loadings on the targeted factor and low 
loadings on other factors. Most of their criteria are consistent with classical test the- 
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огу, and some are more consistent with item response theory (IRT). Goldberg (1992) 
developed a list of 100 ‘unipolar’ markers for the Big Five, which is to be conside- 
red as illustrative of some of those criteria. The alphabetical list can be found as Ар- 
pendix A in Goldberg (1992), together with an instruction and a rating scale for self- 
description. In his 1992 article Goldberg concludes: “It is to be hoped that the avai- 
lability of this easily administered set of factor markers will now encourage investi- 
gators of diverse theoretical viewpoints to communicate in a common psychometric 
tongue". This list of Big Five markers necessarily carries the restrictions that are in- 
herent in utilizing the single language of American English and the corresponding 
Big Five structure as a standard for other languages. Saucier and Goldberg (this vo- 
lume) also discuss criteria such as brevity, orthogonality and differentiation, and 
they provide illustrative examples of the application of those criteria. 


Big Five inventories, questionnaires, and adjective scales 


Many instruments have been developed to assess the Big Five factors. Notwithstan- 
ding the fact that those instruments, of which the majority is presented in this vo- 
lume, purport to assess essentially the same constructs, they differ in various res- 
pects. They mav highlight the particulars of a specific language or culture (e.g., 
NEO-PI-R in Icelandic), focus on assessing specific age groups (e.g., HiPIC), emp- 
hasize alternative media of communication (e.g., NPQ) or specific uses in the asses- 
sment process (e.g.. SIFFM), provide a different theoretical perspective (e.g., a dya- 
dic interactional perspective), or are especially detailed about certain psychometric 
issues (e.g., Kashiwagi, this volume). 

Costa and McCrae's (1985; 1992) NEO-PI-R is the most frequently used perso- 
nality questionnaire to assess the Big Five. Although it has been influenced by and 
geared towards the early formulation of the Big Five model (Norman, 1963), it has 
not been developed within the psycholexical tradition. The development of the N 
(Neuroticism), E (Extraversion), and O (Openness to Experience) scales started with 
Costa and McCrae's (1976) cluster analyses of 16PF scales, the intercorrelations of 
which led to the NEO. After taking knowledge of an early Big Five formulation, 
Costa and McCrae added Agreeableness and Conscientiousness to their model, as- 
suming that their N, E, and O captured the first three of the Big Five. The NEO-PI 
(Costa & McCrae, 1985) included scales to assess six facets of Neuroticism, Extra- 
version, and Openness to Experience. Only the 240-item NEO-PI-R (Costa & 
McCrae, 1992) also included six facets of Agreeableness and Conscientiousness. 

A use of the Big Five in which an optimal coverage of the semantics of the Big 
Five system was realized, can be found in the development of the Five Factor Perso- 
nality Inventory (FFPI; Hendriks, 1997; Hendriks, Hofstee, & De Raad, 1999). Use 
was made of the fine-grained ABSC-segmentation of the trait sphere, a system that 
can be conceived of as an empirically based partitioning into facets, the majority of 
which containing a semantically more or less coherent cluster of traits. These facet 
clusters were used as the starting-point for item generation. A pool of 914 items that 
was agreed upon to represent the АВ5С system, was made available with approxi- 
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mately identical phrasings in Dutch, German, and English. Items were only accepted 
for the final pool if clear, unambiguous translations in those languages could be 
found. The final instrument is trilingual in nature and it is pruned to 100 items. The 
items have a simple and easy to understand behavioral format, put in third person 
singular which makes them suitable for both other-ratings and self-ratings. 

The Big Five Questionnaire (BFQ) has been developed for the assessment of the 
Five Factor Model of personality traits using a top down approach, that is by first 
defining the five dimensions, then the most important facets for each factor, and fi- 
nally by producing items to assess these constructs (Caprara, Barbaranelli, 
Borgogni, & Perugini, 1993). The BFQ was developed alongside the first psycho- 
lexical study in the Italian context (Caprara & Perugini, 1994) and took into account 
some of the original findings, especially for the definition of the first factor. In fact, 
in the BFQ the first factor is defined as Energy, which is also based on a careful 
scrutiny of the adjectives defining factor I in the psycholexical study. 

Research on the structure of childhood traits is scarce. As part of an international 
research project (Kohnstamm ег al., 1998) Mervielde and De Fruyt assembled an 
extensive list of items by which parents describe their school-age children. That pool 
of items, the classification of which took largely place within the confines of Big 
Five categories, ultimately led to the development of the HiPIC, the Hierarchical 
Personality Inventory for Children. 

Especially to clinicians and to researchers in the domain of personality disorders, 
semi-structured interviews are at times preferred over self-report because they allow 
the evaluation of the practitioner to be included in the assessment. The SIFFM was 
developed to provide such an interview-based measure. The measure was designed 
to assess both adaptive and maladaptive characteristics related to Big Five traits. The 
SIFFM is especially meant as a complement to a Big Five self-report, mainly be- 
cause self-reports may be affected by current mood-states and by other self related 
biases, especially in clinical settings. 

А controversy with respect to verbal self- and other-ratings is that they may re- 
flect consistencies in language rather than consistencies in observed behavior. For 
this reason, Paunonen, Ashton, and Jackson (2001) developed an instrument that did 
not make use of verbal items, but included cartoon-like pictures, in which a person 
performs specific behaviors in specific situations. Paunonen and Jackson (1979) ini- 
tially developed a nonverbal item pool for a person perception study and aiming to 
represent traits of Murray's (1938) system of needs. From this item pool a subset of 
items was selected to form the Nonverbal Personality Questionnaire (NPQ). With a 
few exceptions items were selected from the NPQ to form the shorter FF-NPQ, 
measuring each of the Big Five factors. 

A common practice among practitioners and researchers doing cross-cultural 
work is to transport personality inventories developed in one country to another 
country of interest. The issue involved is expressed in Berry's distinction between 
emic and etic structures (1969). Imposing a personality trait structure developed in 
one language (emic) as "universally applicable" (etic) in another language is not 
without danger. The Global Personality Inventory (GPI) involved input from some 
ten teams of consultants and researchers from around the world, with quick consen- 
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sus reached on the importance of the Big Five structure, and more rounds of input 
needed to reach consensus at the facet level. All the teams participated in the pro- 
duction of an item-pool. The items of a long and first version of the GPI was trans- 
lated into nine languages and data were collected in many diverse countries to be 
used for the selection of the final set of GPI items. 

The Traits Personality Questionnaire is a questionnaire that is heavily influenced 
by the NEO-PI-R in the construction stage Starting from an a priori conception of 
the Big Five model of personality, in particular the NEO-PI-R, an item-pool was 
constructed for the development of the TPQue to be used to assess sub-scales and 
factor scales of the Big Five in Greek language. A relatively large set of items was 
used in a pilot study, and the final list of items was selected on the basis their being 
optimal markers of scales and sub-scales. 

A major part of the Big Five domain is of an interpersonal character; particularly 
the factors Extraversion and Agreeableness represent traits and behaviors with 
strong interpersonal connotations. Wiggins (1979) used a category of approximately 
800 interpersonal trait terms, stemming from psycholexical research of traits, for an 
initial description of an interpersonal taxonomy. A two-dimensional circumplex mo- 
del was used for representational purposes. These two dimensions are strongly rela- 
ted to the E and A factors of the Big Five model. After the development of the Inter- 
personal Adjective Scales measuring the two dimensions, a revision followed in the 
IASR. He latter instrument was revised not only to provide a short-form measure of 
the IAS, but also to include the three remaining Big Five dimensions. 

Very often researchers would like to use a Big Five measure, and ask for the 
briefest form possible. Brief and adequate measures are not easily realized. The Big 
Five Marker Scales (BFMS) form an exception. Developed from a set of trait adjec- 
tives common to two independent trait taxonomies in Italian, markers for each of the 
Big Five in Italian were selected for which very reasonable psychometric character- 
istics could be established. These Italian Big Five markers have been iteratively se- 
lected using an emic-etic analytic strategy. Besides representing a quick overall 
measure of the Big Five, the BFMS provides a structure into which the whole spec- 
trum of the personality lexicon can be represented. The resulting taxonomy shares 
much with the ABSC system, although it differs in a few details, and it can be used 
also for robust comparisons across cultural contexts at the facets level. 

The Japanese Adjective List (JAL) for the Big Five contains a final set of 105 
adjectives that are optimally measuring the Big Five in the Japanese context. 
Kashiwagi has carefully adopted a sophisticated factor analytic approach and is de- 
tailed in arguing why this approach leads to an optimal structure. While the proce- 
dure is certainly not standard, it does represent a fine example of how psychometri- 
cally sound techniques can be fruitfully applied to select best markers of the Big 
Five. 


16 Big Five Assessment 


Questionnaires related to or shaped after the Big F ive 


The impact of the Big Five factors have been such that researchers often clarify the 
relations of their own alternative trait models with the Big Five. A few such alterna- 
tive models have been proposed, such as a Big Three (Peabody & Goldberg, 1989), 
a Big Six (Jackson, Ashton, & Tomes, 1996), a Big Seven factor model (Almagor, 
Tellegen, & Waller, 1995) and an alternative Five Factor model (Zuckerman, 1994). 
АП these models share features with the Big Five but differ too. In addition, some 
classic personality inventories originally developed to measure some other persona- 
lity dimensions and still widely used throughout the world have been molded after 
the Big Five. 

The HPI (Hogan & Hogan, this volume) has a conceptual foundation in the So- 
cioanalytic theory that combines interpersonal theory and evolutionary theory. One 
of the unique characteristics of the HPI is that it has been developed using data from 
working adults, and it is designed to be especially apt for use in occupational set- 
tings. Although the Big Five model was used as the starting-point for the constructi- 
on of the inventory, it was concluded that seven dimensions were necessary to des- 
cribe the data. Those seven dimensions can be easily aligned with the Big Five, ei- 
ther directly or as a combination of them. This allows re-interpreting existing data in 
light of the Big Five as well as to compare findings. Emotional Stability, Agreeable- 
ness, and Conscientiousness of the Big Five can be identified in HPI Adjustment, 
Likeability, and Prudence. Extraversion can be retraced in both HPI Ambition and 
Sociability, and Intellect/Openness to Experience is represented in both HPI Intel- 
lectance and School Success. 

The Six Factor Personality Questionnaire (ЗЕРО; Jackson & Tremblay, this vo- 
lume) is developed using scales of the Personality Research Form (PRF), a questi- 
onnaire constructed for the assessment of personality variables largely based on the 
work of Murray. Three of the six factors are identified as Extraversion, Agreeable- 
ness, and Openness to Experience. One departure is the Independence factor which 
name is related to the opposite pole of Neuroticism. The other departure is found in 
the two factors Methodicalness and Industriousness, which can be seen as a division 
of Conscientiousness. 

Zuckerman's alternative five-factorial model (ZKPQ) does not show a one-to-one 
correspondence with the Big Five. Yet, four of the ZKPQ factors can be readily in- 
terpreted in the Big Five framework. Big Five Intellect/Openness to Experience is 
not represented; instead the activity and energy facets of Big Five Extraversion have 
obtained more emphasis in a separate Activity factor, which seems to more in line 
with the biological orientation of the ZKPQ. 

One of the most classical personality questionnaires, the 16PF, provides in its 
fifth edition five second-order factors which can be easily understood as variants of 
the Big Five. The 16PF came about as a final result to represent substantial parts of 
the original Allport and Odbert (1936) list of trait terms. That long list of several 
thousands of terms was represented at an intermediate stage by a list of 171 trait 
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terms. A further reduction of the latter list was used as input for the development of 
the 16PF. The adult version of the 16PF has been evaluated in different settings, 
while alternative forms have been developed for specific contexts. In addition, ver- 
sions have been developed for age groups younger than 16. 

The Adjective Check List (ACL) is developed for use especially in research situ- 
ations. FormyDuval Hill, Williams, and Bassett (this volume) describe how the ACL 
can be scored to assess the Big Five dimensions. The ACL started with a substantial 
number of Cattell's list of 171 trait terms and to that selection other items were ad- 
ded to represent relevant concepts from Freud, Jung, Mead, and Murray. 

An articulated situation is offered for the MMPI, which is one of the most used 
personality inventory for psychopathological assessment, originally developed in the 
‘40s and recently refurbished (MMPI-2). Harkness and McNulty (1994) have deve- 
loped the so-called PSY-5 constructs starting from a pool of symptoms and characte- 
ristics of both normal and dysfunctional personality functioning leading to the iden- 
tification of 60 major topics in human personality. These fundamental topics have 
been used to generate five higher order aggregates having some resemblance with 
the Big Five. The PSY-5 scales can be obtained after ad hoc coding of the 567 
MMPI-2 items. Among these scales, there is a higher representation of emotionally- 
laden constructs or, whereas the factors Conscientiousness and especially Openness 
to Experience (or Intellect) are less well covered. 

The PPQ originated from Kline's work on personality and from his attempt to of- 
fer a Big Five based instrument for the occupational context. Unfortunate circum- 
stances, especially the early death of Paul Kline, have influenced the further testing 
and the PPQ properties and its availability to the research community. However, 
Barrett's description (this volume) of the PPQ possibly opens up avenues for further 
research with the instrument. 


Final comment 


Because the Big Five model has acquired the status of a reference-model, its uses 
can be expanded to those of systems of classification and clarification for descriptive 
vocabularies that are not developed from a Big Five perspective, in order to evaluate 
the comprehensiveness of the trait-semantics of those vocabularies. Moreover, the 
model is expected to play an important role in modern theory building, due to the 
fact that its five main constructs capture so much of the subject matter of personality 
psychology. A recurrent issue in this volume is the further specification of abstract 
and general factors. Many of the assessment instruments described in this volume 
enable to assess facets in addition to factor scales. Part of the specification procedu- 
res might profit well from paying attention to systematic reviews of situational fea- 
tures. 

Many more instruments along the main Big Five theme will be developed in the 
near future, as translations of existing instruments or as instruments that are com- 
pletely developed within particular languages. Especially efforts may be expected to 
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specify facets of the Big Five that can be cross-culturally validated. The issue of 
cross-cultural generalizability is a recurrent one in the recent history of the Big 
Five. After an initial period where the Big Five have emerged in so many different 
languages and cultures, there is now more awareness of subtle differences that may 
exist between factors recovered in lingually different psycholexical studies. Whereas 
a core of consensus among the basic features seems to be agreed upon, there are still 
differences of opinion concerning the number of factors, rotational variants, culture- 
specific factors, and so on. Some authors have argued that a Big Three model may 
represent a cross-culturally more generalizable structure of personality (e.g., 
Peabody & Goldberg, 1989; Di Blas, Forzi, & Peabody, 2000; cf. Peabody & De 
Raad, 2000). Curiously, one of the Big Two – historically strong — dimensions, Neu- 
roticism, is not emphasized in that system; that discrepancy deserves special attenti- 
on. The few studies psycholexical studies where the cross-cultural generalizability 
has been tested (De Raad, Perugini, & Szirmák, 1997; De Raad, Perugini, 
Hiébicková, & Szarota, 1998; Hofstee, Kiers, De Raad, Goldberg, & Ostendorf, 
1997) have produced an "annotated" Big Five at least. On the other hand, question- 
naires developed within one culture, and validated in other cultures (e.g., NEO-PI-R, 
FFPI) have generally given impressive evidence supporting the stability of the Big 
Five. Once again, the emic-etic distinction plays quite a role in determining what the 
answer is to the issue of cross-cultural generalizability. Studies in more and different 
languages may add to this discussion; the African and the South American conti- 
nents, Arabic countries, and traditional communities with preserved languages, have 
not participated yet. 

Another issue concerns the sufficiency of the Big Five. Whereas we believe that 
the Big Five is a reasonably agreed upon system and for now the best working hy- 
pothesis, and that it is important to use them in as many research contexts as possi- 
ble, it is by no means obvious that they are sufficient. For instance, a general factor 
related to Morality and Honesty tends to appear in several psycholexical studies 
when data are reanalyzed (Ashton & Lee, 2001; Ashton, Lee, & Son, 2000). More 
research in different languages should provide a test in case; perhaps in some years 
researchers may need to reconsider their ideas about a Big Five personality system. 
In this respect, the Big Five model is better conceived of as a starting point for 
further research, instead of as an arrival point. 

Trait structures from different languages differ, and so do assessment instru- 
ments, imported or not. This conclusion is not dramatic; it is a challenge to cross- 
cultural research-programs to isolate and identify what is valid across cultural bor- 
ders, and to specify the particulars of the different cultures. A lot has yet to be done. 
The Big Five factor model has shown to be highly prolific in the construction of as- 
sessment instruments, notwithstanding the fact that its significance has only been 
recognized during the last decade of the twentieth century. Moreover, the Big Five 
factors are far from definitive, and the derived assessment instruments deserve con- 
stant attention and an open eye for new facets and features to be included, in the 
model as well as in its assessment. 
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Assessing the Big Five: Applications of 10 
psychometric criteria to the development of 
marker scales 


Gerard Saucier 
Lewis R. Goldberg 


Introduction 


А factor is a parsimonious reduction of many observed variables into one hypotheti- 
cal variable, accomplished within a particular set of data. The Big Five personality 
factor structure (Goldberg, 1981; Saucier & Goldberg, 1996a, 1996b) involves five 
orthogonal (i.e., mutually uncorrelated) factors that capture the five largest sources 
of variance shared by the variables in fairly representative assemblages of personal- 
ity-attribute descriptors in a number of languages (e.g., English, German, Polish, 
Czech, Turkish). Whether the Big Five is the optimal cross-culturally generalizable 
taxonomic structure for human personality is still a matter of controversy (see 
Saucier & Goldberg, in press), but it is clearly a very useful structure. 

Once a useful set of factors like the Big Five is discovered, it is expedient to ex- 
tend them beyond the particular set of data in which they were first located. How 
might one do so? One option would be to readminister the entire set of variables that 
led to the factors and repeat the analysis in new samples. But in the case of factors 
based on large numbers of variables (like the Big Five), this is quite inconvenient. 
Instead, it would be desirable to discover a relatively small set of variables that will 
consistently produce the structure a set of factor "markers." 

This chapter describes various marker sets developed by the authors for the Big 
Five and related structures. We present these marker sets within a broader concep- 
tual framework, reviewing 10 diverse psychometric criteria by which marker sets 
can be developed and evaluated. Because constructing a set of factor markers is 
typically an item-reduction exercise (i.e., selecting an optimal set of items from a 
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larger item pool), we focus on the item selection process. The principles and issues 
we discuss are important to personality-test construction in general, in most cases 
applying also to scales that are not factor-analytically derived as are Big Five scales. 

To conserve space, we will generally provide a summary of our scale develop- 
ment procedures, and then refer the reader to published articles on these marker sets; 
in the case of our new unpublished marker sets we will provide more detail. 


Item phrasing: Two widely recognized basics 


Criterion 1: Clearly understandable items 


Unless one has an explicit interest in collecting responses to ambiguous stimuli (as 
in projective instruments), the meaning of an item should be relatively unambiguous 
to the respondents. Assuming one's stimuli include words, a clear and easy to under- 
stand item is one that uses familiar rather than difficult vocabulary, and simple 
phrases lacking conjunctions which make items "double-barrelled." Indeed, from 
this standpoint single words (e.g., adjectives) might be considered superior, but they 
have one important limitation. Single words are often polysemous (1.е., they have 
multiple meanings) and thus somewhat ambiguous; a good item subdues rather than 
aggravates this tendency. 

Those items that meet various other criteria we describe later, such as being 
highly associated with other items or having high loadings on factors, tend to be 
clear and unambiguous ones. At the outset of item selection, however, the investi- 
gator may save much time and effort by identifying and eliminating the most unclear 
and difficult items. In lexical studies that have led to the Big Five structure, this 
elimination process has been built into the initial process of reducing the number of 
variables from thousands to hundreds in preparation for data collection. 


Criterion 2: Balanced keying 


Imagine that all of the items indexing an attribute were formulated so that the keyed 
response (the one that contributes to a high rather than a low score) involved the 
same response option (e.g., "True" rather than "False") or were at the same end of а 
rating scale. In this case, the content of the scale will be inextricably confounded 
with individuals’ preferences to use one or the other end of the rating scale (i.e., 
response "acquiescence"). In general, each of the scales in an optimal marker set 
should have an equal number of items representing the presence of an attribute and 
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either its opposite attribute, if possible, or its absence’, Balanced-keyed scales pos- 
sess one kind of desirable method-heterogeneity (Nunnally & Bernstein, 1994, p: 
313). As we shall see, the balanced-keying desideratum presents a challenge in some 
domains (e.g., Neuroticism) where it is difficult to find or formulate a large number 
of candidate items representing the lack (or opposite) of that particular attribute. 

The importance of balanced keying suggests that optimal measures should be de- 
veloped not from single items but rather from parcels of items, each parcel con- 
sisting of an equal number of items keyed in each direction. Because acquiescence 
would then not contribute significantly to factors, factor analysis of parcels should 
be preferable to analyses of single items. One could use the sum of responses to 
pairs of opposite-keyed items (with scores not reflected) as an index of acquies- 
cence; this index may be a useful covariate, inasmuch as acquiescence variance can 
affect the factor structure (Hofstee, Ten Berge, & Hendriks, 1998). Without bal- 
anced keying, acquiescence is likely to be confounded with item content and with 
social-desirability responding (Hofstee et al., 1998). Marker sets described by Sau- 
cier (2000b), described later in this article, make use of parcels. 


Incorporating desirable elements of diverse 
scale-construction strategies 


Goldberg (1972; Hase & Goldberg, 1967) described three general strategies of test 
construction, labeled Intuitive, Internal, and External. In the Internal (or factor- 
analytic) strategy, items loading most highly on a factor are selected for the scale 
measure of the factor. Only the internal structure of the initial item pool determines 
item selection and keying direction, although the labeling of the scales developed by 
this strategy rests on the test constructor's personal judgment. Because marker sets 
are by definition based on factors, it is the Internal strategy that we will emphasize 
in this chapter. However, the Internal strategy used alone can lead to limitations in 
the resulting scales. Elements of the two other strategies can make an incremental 
contribution to marker set construction, as seen in our next two criteria. 


1 At a more technical level, it should be apparent that balanced keying will not guarantee perfect 
balance between the two types of items, since the correlations among the items of each type will 
affect the variance associated with each of those half-scales; differences in these correlational 
patterns can lead to differences in the relative weights of the half-scales in the composite measure. 
Nonetheless, balanced keying will generally provide at least some rough control of this problem. 
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Criterion 3: Intuitive fit between item and construct 


Goldberg (1972) noted that: 


‚.. the very characteristic of both the External and Internal strategies that gives them 
their power also provides their Achilles' Heel: namely, their dependence upon — 
and vulnerability to — characteristics of the particular samples used in their con- 
struction. The Intuitive strategy, in contrast, is minimally dependent on sample- 
specific characteristics; only at the stage of scale "purification" (e.g., discarding 
items with low correlations with scale scores) do sample characteristics have any 
chance to enter the scale construction process (p. 49-50). 


Goldberg (1972) found that Intuitive scales, those developed solely from judg- 
ments about the item content, turned out to be of comparable validity to scales de- 
veloped by other strategies, a finding replicated by Burisch (1978; see Burisch, 
1984a). Indeed, Ashton and Goldberg (1973) found that the average psychology stu- 
dent was able to construct scales as reliable and valid as well-known External scales 
constructed by a far more expensive and time-consuming process. 

What gives the Intuitive approach its strength? Ashton and Goldberg (1973) 
noted that face validity and empirical validity should converge when there are con- 
ditions of mutual trust between subjects and investigators, as in typical self-reports 
under anonymous conditions (though not necessarily when there is something to 
gain by deception). Under such research conditions, it has long been known that the 
more directly the content of the items corresponds to the content of the construct, the 
better is the measure; and alternatively the more "subtle" are the items (in terms of 
the scoring keys), the less robust are those items across different subject samples and 
assessment contexts (Goldberg & Slovic, 1967; Jackson, 1971; Norman, 1963). 

What's the take-home message for developers of marker scales? There is much to 
gain by ensuring that the items relate to one's intuitive or theoretical understanding 
of the content of the dimension in question. Items that do not have this relation are 
more prone to be reflecting artifacts, or chance characteristics of the sample at hand. 
Thus, the Intuitive approach provides some assurance against faulty reliance on 
sample-specific characteristics. 


Criterion 4: Suitable bandwidth 


Hase and Goldberg (1967) described the External strategy as one in which the items 
are selected on the basis of their associations with some external criterion (e.g., peer 
ratings, job performance). In one version of this strategy, the test constructor ini- 
tially attempts to locate two distinct groups of subjects who differ in some signifi- 
cant manner (e.g., schizophrenics vs. normals, lawyers vs. people in general, males 
vs. females) or who fall at each of the two poles of a personality trait (as determined, 
for example, by peer ratings). The test items are then administered to members of 
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both criterion groups, and those items that differentiate most strongly between the 
groups are retained for the scale. In the pure form of this strategy, only the empiri- 
cally discovered discriminating power of the item determines item selection for a 
scale, and the scale is typically labeled in terms of the criterion groups used. Com- 
mon characteristics of scales developed from the External strategy are their hetero- 
geneity in content, which results in rather low intercorrelations among the items; to 
ensure high Alpha coefficients for the resulting scales, the scales must be quite long. 

As already noted, two studies (Hase & Goldberg, 1967; Ashton & Goldberg, 
1973) failed to find any validity advantages for External scales. However, Goldberg 
(1972) found that the External strategy appears to produce a broader bandwidth in- 
strument — that is, one valid for a broader array of criteria — though one with 
slightly lower fidelity for the most predicable criteria. The slight advantage in 
bandwidth must be due to these less homogeneous scales including some per- 
sonologically relevant variance that is not included in the scales developed by the 
Internal and Intuitive strategies. Although we do not typically advocate the use of 
the External strategy, these findings suggest an important caution for developers of 
marker sets. To the extent that an investigator seeks to maximize homogeneity, 
he/she may be unknowingly compromising validity, especially with respect to any 
additional criteria beyond those that may be originally anticipated (Loevinger, 
1954) 

One application of the External strategy of scale construction that has been used 
to develop marker scales relies on the selection of items with particularly strong cor- 
relations between self and peer descriptions of the target person. For example, in the 
development of their Five-Factor Personality Inventory (FFPI) Hendriks, Hofstee, 
and De Raad (1999) used self-peer agreement as a primary (external) criterion for 
item selection. To the extent that marker items selected using this criterion are not 
particularly univocal indicators of the factors they were selected to approximate, this 
strategy can lead to high inter-scale associations, as is true of the FFPI scales. 

Representative sampling is an alternative approach that promotes the selection of 
content with broad bandwidth. Loevinger (1957) suggested that in any measure “the 
various areas or subareas of content should be represented in proportion to their life- 
importance" and noted that Cattell, an early advocate of the lexical approach, “as- 
sumed that life-importance could be judged by dictionary representation" (p. 659). 
Representative sampling of items from some domain of content is no different in 
principle from representative sampling of subjects from some population of interest. 
In both cases, one must select the strata, regions, or facets that one wants to sample, 
and then one selects the individual persons or items within each class on a quasi- 
random basis. For the representative sampling of items, one may attempt to include 
representatives of as wide a range of variables as one can locate. To the extent to 
which one can locate a full range of facets in the domain, one can sample broadly, 


2 |n this chapter, we focus on bandwidth at the scale level. Broad item-level bandwidth (i.e., the 
extent to which a single item captures a broad array of content) might lead to an increase rather 
than a decrease in scale homogeneity. Broad items might be constructed using broad, familiar 
descriptive concepts (e.g., 15 good, Is attractive) or, more problematically given our Criterion 1, by 
joining several forms of content by conjunctions (e.g. , Is kínd and generous and humble). 
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and thus one's measure should be associated with a wider range of potential crite- 
rion variables. 

Representative sampling is consonant with the goal of "content validation." 
Content validation is appropriate to situations where two conditions hold: (a) valid- 
ity depends greatly on the adequacy with which a specified domain of content is 
sampled and (b) the measure must stand by itself as an adequate measure of what it 
is supposed to measure, with no ultimate gold-standard criterion ever likely to be- 
come available to serve in its validation. Content validation generally involves refer- 
ence to a standard source or to relatively objective expert views (which might be 
represented in the literature). One example of an attempt to provide a representative 
sample of constructs from the scientific literature is the set of six facet scales tar- 
geted at each of the five factors included in the NEO-PI-R (Costa & McCrae, 1992). 

A study by Peabody (1987) exemplifies a representative-sampling approach. He 
began with an item pool of 571 personality adjectives derived from previous re- 
search (e.g., Goldberg, 1982; Norman, 1967). These terms were reduced systemati- 
cally to a set of 53 bipolar pairs, which were included as a representative set in the 
studies by Peabody and Goldberg (1989). Similarly, Saucier (in press) developed 
100 representative parcels based on 500 very high frequency adjectival person- 
descriptors in English. Pairs of terms whose highest correlation (positive or nega- 
tive) was with each other were supplemented by additional terms as needed to in- 
crease Alpha. A marker set derived from Saucier's representative set of parcels is 
described later in this chapter. 

Goldberg's (1990) 133 clusters provides another illustration of Big Five factor 
markers based on representative sampling. The starting point was а set of 1,431 per- 
sonality adjectives that Norman in unpublished research had classified into 75 cate- 
gories. Using criteria of (a) lexicographically documented synonymity and (b) rela- 
tively homogeneous social-desirability values, Goldberg (1990, Study 2) reduced the 
terms to 479, grouped into 133 clusters’. In a further study leading to a revised set of 
100 clusters, Goldberg used perhaps the most common item-selection criterion — 
that of internal consistency — our next topic. 


? The criterion of homogeneous desirability values is related to a criterion of homogeneous response 
means (similar desirability values tend to lead to similar means), and is highly compatible with the 
criterion of maximizing internal consistency. This is because variables with similar means have a 
higher maximum intercorrelation than do variables with differing means. However, this criterion is 
not harmonious with aspects of modern test theory that critique "parallelism" (е.2., sets of nearly 
redundant items) and put a premium on scales whose items have a wide range of difficulty levels 
(with the response-mean parameter being analogous to difficulty level). One could argue, as we do 
later, that there is advantage in using the short homogeneous parcel (rather than the item) as a 


basic unit, and aggregating parcels having a range of difficulty levels (response means) into the 
marker scale. 
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Criteria consonant with classical test theory 


Criterion 5: Maximizing internal consistency 


Virtually all psychologists have been taught that a desirable feature of any measure 
is high reliability (the relative absence of measurement error). Internal consistency is 
the most commonly employed form of reliability, and it is typically estimated by 
Coefficient Alpha. Alpha is a function of scale length (the longer the higher) and 
homogeneity (the average intercorrelation among the items in a scale). Naturally, 
then, maximizing Coefficient Alpha has been widely used as an item-selection crite- 
rion. Using contemporary data-analysis software, one can easily identify and cull out 
those items whose corrected item-total correlations are sufficiently low that their 
removal increases the value of Alpha in the sample under study. We will refer to 
this common strategy as "Alpha-maximizing." 


Application: Goldberg's (1990) 100 custers 


Goldberg's (1990, Study 3) creation of 100 clusters illustrates the Alpha- 
maximizing approach. The 133 clusters developed in Goldberg's Study 2 included a 


^ There appear to be two prime reasons for the relative emphasis on internal consistency over retest 
stability: (a) unlike internal consistency coefficients, retest stability coefficients are influenced by 
extraneous elements like carryover effects and practice effects as well as the length of time 
between measurements; (b) the assumption that all personality attributes must ideally be stable 
over time can be questioned. However, one could conceivably use retest stability as an item 
selection criterion, particularly if one wished to favor stable over unstable attributes. Obviously, for 
attributes that are temporary states (e.g., emotions) retest stability would be expected to be 
moderate. If one focuses on internal consistency, an alternative to Alpha is Omega (Zinbarg, Yovel, 
Revelle, & McDonald, 2000) which is a function of average general factor loading more than average 
intercorrelation. 


> In the simplest IRT (item response theory) model, the one-parameter-logistic (1PL) model, item 
difficulty levels are allowed to vary, but item-discrimination indices are constrained to be equal. 
The nearest equivalents in classical measurement approaches to IRT item-discrimination indices are 
either the corrected item-total correlation or the loading of an item on a factor that represents the 
attribute. Accordingly, a good scale under the 1PL model has some analogy to a scale in which all 
the items have similarly high corrected item-total correlations, or similarly high loadings on a 
common factor. A scale whose items discriminate equally well has some distinct virtues. When 
these items are subjected to reliability analysis, it will be impossible to improve the internal 
consistency of the scale by deleting any item. Moreover, if these items were subjected to a factor 
analysis, they would have the maximum possible tendency to cluster together on a single factor, 
both in replications as well as in the original sample. 
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few single-term categories (of uncertain reliability) and a few clusters with low in- 
ternal consistency coefficients. In new samples of data using the 479 adjectives in- 
cluded in the 133 clusters, Goldberg eliminated the single-term categories, and it- 
eratively eliminated the least homogeneous items from a number of other clusters 
(i.e., those with the lowest item-total correlations); a few new synonym sets were 
developed from the items that were no longer included in the remaining clusters 
(again based mainly on the Alpha-maximizing criterion). The result was a set of 100 
clusters based on 339 adjectives.° 

As would be expected, the Alpha values of the new 100 clusters were higher than 
those from the initial set of 133 scales. In general, Alpha values based on original 
responses are higher than those based on ipsatized responses (i.e., cases Z-scored so 
that each respondent has a mean of 0 and a variance of 1 across all of the items), be- 
cause individual differences in response biases tend to increase indices of internal 
consistency. Using ipsatized data, the Alpha values for the 100 clusters averaged 
.61, as compared to .48 for the initial 133 scales. The items included in these 100 
clusters, along with the scale reliabilities, are available in an earlier report (Gold- 
berg, 1990, Table 3). 

The 100 clusters have the advantage of a high degree of representative sampling: 
А wide array of personality attributes is included. Another advantage is that the 
clusters can be used as lower-level facets in their own right, affording an abundance 
of information for predictive purposes. One disadvantage is that no cluster included 
reverse-keyed items. Another is that 339 adjectives must be administered in order to 
provide scores for the Big Five factors. Moreover, one has no guarantee that the five 
derived factor scores would be the same in different subject samples, partly because 
some of the clusters have complex associations with the Big Five factors, and some 
have only weak relations with any of them. In other words, these are a richly de- 
tailed but not an efficient set of factor markers. In subsequent work, Goldberg 
(1992) developed more efficient (albeit less representative) marker sets by applying 
two other widely used item-selection criteria, to which we now turn. 


Criterion 6: Factor saturation (high loadings on the targeted factor) 


This criterion is central to the Internal strategy of scale construction (Goldberg, 
1972). The rationale is clear-cut. A set of markers for a factor is designed to repre- 
sent that factor. What more efficient way to represent the factor than with the items 
that have the highest loadings on (or extension correlations with) that factor? 

High internal consistency is a necessary but not a sufficient indicator of unidi- 
mensionality (Zinbarg, Yovel, Revelle, & McDonald, 2000). However, Criteria 5 
(Internal consistency) and 6 (Factor saturation) generally tend to converge, most 
strongly so in the special case of a structure that properly has only one factor. The 
first unrotated factor has the maximum internal consistency of any possible linear 


A However, eliminating items based on a internal consistency criterion can capitalize on chance in a 
way similar to stepwise regression analyses, and thus both procedures share this potential liability. 
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combination of the items analyzed. Factor scores on this single factor are highly re- 
lated to a scale derived from the highest loading items, and thus either can be used to 
index individual differences on that factor. MacDonald (1999) notes that a "(psy- 
chometrically) homogeneous test is one whose items measure just one attribute in 
common — à common factor," a supposition that can be tested by "seeing if the re- 
sponses to them fit the single factor model" (p. 78). In a simplified scale- 
construction approach, one might merely utilize Criteria 5 and 6, which are already 
highly convergent. Our next criteria can each be seen as a way of correcting for the 
limitations of this simplified approach. 


Criterion 7: Factor discrimination (low loadings on other factors) 


In a factor structure like the Big Five that includes more than one factor, those vari- 
ables having high loadings on each factor can be distinguished from one another by 
the relative magnitude of their loadings on the other factors. Some variables may 
have high loadings on one or more of the other factors; variables with such high 
"complexity" with respect to the factors may be viewed as factorial “blends” or as 
"interstitial" variables. Some variables may have low loadings on all of the other 
factors that are retained; variables with such extreme "simplicity" with respect to the 
factors might be thought of as factorially “univocal” or as “factor-pure.” This low- 
divergent-loadings criterion tends to enhance the unidimensionality of the set of 
variables associated with each factor. 

If one combines Criteria 6 (Factor saturation) and 7 (Factor discrimination), one 
selects those items that simultaneously load most highly, and most univocally, on a 
given factor. These are the variables that conform most directly to the factor rotation 
concept of "simple structure," as they capture the distinct features of the factor most 
directly. The most efficient set of factor markers are these univocal variables, and 
they are the ones that are most likely to produce across-sample replicability of the 
factor structure. That is, factor markers consisting of univocal items can be expected 
to reproduce the original factor structure in replication studies more surely than 
those with either low or complex associations with the factors. 

We note one caveat about univocal variables, to which we shall return later: 
Other things being equal, the more univocal are the variables included in a marker 
set, the more homogeneous and narrow are they likely to be. But, as indicated by our 
Criterion 4, the bandwidth of one's markers constitutes an important property of any 
marker set. To the extent to which one desires a broad-bandwidth instrument, per- 
haps by the representative sampling of lower-level facets, one will usually go be- 
yond extremely univocal variables, which can be homogeneous but narrow in con- 
tent reference.’ 


7 Criterion 7 can be related to the Stylistic strategy of scale construction (Hase & Goldberg, 1967) as 
exemplified in the development of social-desirability scales. Big Five scales tend to have their 
favorable poles associated, suggesting that the intercorrelations reflect a single desirability factor. 
Thus, the “low loadings” criterion tends to minimize the influence of this stylistic factor. 
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Application: Goldberg's (1992) 100 unipolar markers 


Although the 100 clusters from Goldberg (1990) have the advantage of providing 
markers with an unusually broad bandwidth, they require the administration of 339 
adjectives, and thus they are hardly a maximally efficient marker set. In constructing 
some more efficient sets of Big Five markers, Goldberg (1992) took into account all 
of the criteria that we have heretofore discussed. Among the marker sets described 
in that article was a set of 100 unipolar factor markers that has since become widely 
used. 

From a pool of 566 reasonably common personality-descriptive adjectives, Gold- 
berg selected 116 terms having high loadings on one factor and, relative to other 
candidate terms for that factor, relatively low loadings on the other factors (Criteria 
6 and 7). The initial set of 116 terms was reduced to 100 using the internal- 
consistency criterion (Criterion 5), as well as a criterion of replicability in their fac- 
tor loadings across three samples of subjects. In order to make the marker sets rela- 
tively equal in size, 20 terms were selected for each factor." In order to reduce the 
effect of individual differences in response scale usage, 10 items were selected for 
the positive and 10 for the negative pole of each factor, with the exception of Factor 
IV (Emotional Stability) where a dearth of suitable positive items led to a mix of 6 
positive and 14 negative items. 

Goldberg (1992) showed that each of the five 20-item subsets of these 100 
markers, when considered as separate scales, yielded highly reliable scores: The 
mean (across factors) Alpha coefficients ranged from .85 to .93 depending on the 
data set, with all of the coefficients above .80 in each data set. Partly because of 
these favorable psychometric properties, this marker set has been widely used to in- 
dex the Big Five factors. However, there are at least three potential problems with 
this marker set, each of which has led to the development of different types of new 
markers. First of all, many investigators desire to include some markers of each of 
the Big Five domains in an extensive battery of other measures, but they balk at de- 
voting the testing time needed to administer all 100 items: for these purposes, 
shorter and thus more efficient marker sets are needed. In addition, although the Big 
Five factors are orthogonal conceptually (and when operationalized via orthogonal 
rotations), the five scales scored from the 100 markers are typically at least slightly 
interrelated; less highly related marker sets would be desirable in some contexts. 
These problems are addressed by additional criteria for item-selection that we will 
discuss shortly. 

Another problem with Goldberg's (1990) 100 markers involves the nature of the 
items themselves. Because single trait-descriptive adjectives encode behaviors at 
such a high level of abstraction, they are often difficult to translate precisely from 


8 Note that if one's goal was an even more complete equality of the scale variances, one might have 
had to select slightly different numbers of items for each of the five factors. 
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one language to another. That is, although it is often possible to locate a term in each 
of the two languages that refers to much the same type of behavior, the two terms 
may differ in their social-desirability value (Hofstee, 1990). More behaviorally 
specified item formats (such as the items included in the International Personality 
Item Pool; Goldberg, 1999) could turn out to be far easier to translate with precision. 
One way of addressing this problem is illustrated in the following application. 


Applying these criteria to an alternative item format 


Goldberg (1999) reported the development of an Internet collaboratory for the ad- 
vancement of personality measurement, based on an item format pioneered by Hen- 
driks, Hofstee, and De Raad (1999). This International Personality Item Pool (IPIP) 
now includes nearly 2,000 items, each a short verbal statement describing some as- 
pect of one's thoughts, feelings. or behaviors (e.g., Act wild and crazy; Don't care 
about rules; Sense others' wishes; Have a soft heart). Preliminary personality scales 
have been developed from the IPIP to measure the 45 bipolar facets from the A- 
bridged Big Five-dimensional Circumplex model (ABSC) of Hofstee, De Raad, and 
Goldberg (1992). Table 1 lists the number of items keyed in each direction, the 
mean item intercorrelation, and the Coefficient Alpha reliability estimate for each of 
these 45 ABSC marker scales; the items included in each scale are listed in Goldberg 
(1999). Most of these scales include about 10 items, with mean intercorrelations 
around .25 and Alpha coefficients around .80. 

The coordinates for the ABSC model were based on Goldberg's (1992) 100 uni- 
polar markers. Therefore, the 45 bipolar IPIP-ABS5C facet scales can be regarded as 
a translation of an adjectival Big Five marker set into a more behaviorally specified 
item format, by means of an application of "uniform" sampling. Uniform sampling 
of a semantic space is a characteristic of the "circumplex" tradition (e.g., Wiggins, 
1980) in which the locations of variables in two dimensions are projected onto a cir- 
cular representation, and then exemplar items are selected at equally spaced loca- 
tions around the circle. In uniform sampling, regions where variables are densely 
concentrated are systematically undersampled whereas more sparsely populated in- 
terstitial regions are oversampled (Goldberg, 1992), thus contrasting markedly with 
representative sampling. 

As markers of the broad Big Five domains, one could use the five IPIP scales 
measuring the factor-pure АВ5С facets. However, each of these IPIP scales includes 
only items that are more highly associated with their narrow facet than with any of 
the other facets, and such items may not necessarily be optimal measures of the five 
domains alone. Moreover, those scales are targeted at the Big-Five factor structure 
of phenotypic personality attributes (Saucier & Goldberg, 1996b), not at McCrae 
and Costa’s (1996) Five-Factor Model of personality traits, which differ to some de- 
gree in how the factors are conceptualized. Some investigators may prefer to meas- 
ure the constructs in the latter model rather than (or in addition to) those in the for- 
mer one. Consequently, we have developed IPIP-based measures of both models. 
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Table 1. Characteristics of the 45 preliminary IPIP scales targeted at the АВ5С facets 


ABSC Facet . Provisional Label Мо. of Items Mean Item г Coef. Alpha 
Factor 1 
1+/1+ vs. 1-/1- Gregariousness 446210 .34 .83 
14/114 vs. 1-/11- Friendliness 545210 ЗҮ .85 
|+ ЛИ vs. 1-/111- Assertiveness 9+3 = 12 .20 ans) 
I+/IV+ vs. |-ЛМ- Poise 5 +.5 = 10 .31 ‚82 
l+/V+ vs. I-/V- Leadership 5+5 = 10 81 ‚82 
*|e/ll- vs. 1-ЛЇ+ Provocativeness 843211 .19 ‚72 
l+/Ill- vs. ЛИ Self-Disclosure 842210 26 ‚78 
1+/IV- vs. I-/IV4- Talkativeness 842210 .35 .84 
*l+/V- vs. | М+ Sociability Soa 16 .66 
Factor || 
Il+/ll+ vs. 11-/11- Understanding 545210 .30 .81 
11/1 vs. 11-/1- Warmth 94-2211 .33 .84 
"1 ЛИ vs. 1-/||- Morality 547212 .18 TƏ 
[l+/IV+ vs. II-/IV- Pleasantness 6+6=12 122 76 
*I-/V4 vs. II-/V- Empathy 5+4= 9 .20 70 
*[1+Л1- vs. |-Л+ Cooperation 2 +10 = 12 .18 ‚7З 
"ИЛИ vs. И-ЛИ+) Sympathy 6+6 = 12 .20 74 
"П+/М- vs. |-ЛМ+ Tenderness 9+4=13 18 74 
It+/V- vs. Il-/V+ Nurturance 6+7=13 16 ap 
Factor ЇЇ 
+/+ vs.ll-/- Conscientiousness 6+7 = 13 19 .75 
1!1+Л+ vs. |-/- Efficiency 5+6=11 .30 .83 
*Ш+Л!+ vs. ШШ-/Ї- Dutifulness 647213 .21 .78 
Ш+Л\У+ vs. Ill-/IV- — Purposefulness 547212 27 .81 
+/+ vs. IIl-/V- Organization 9 - 3- 12 23 ‚78 
ИЛ vs. И-Л+ Cautiousness 547212 .21 T 
*Ilix/ll- vs. ЇШ-ЛЇ+ Rationality 8+6=14 13 .67 
Ill+/IV- vs. IH-/IV+  Perfectionism 7+2= 9 ‚26 5749) 
ИМ vs. I1I-/V-- Orderliness 7+3=10 27 ‚78 
Factor IV 
IV+/IV+ vs. IV-/IV- Stability 54-5210 37 .86 
IV+/l+ vs. IV-/l- Happiness 5+5= 10 .34 .84 
IV+/Ii+ vs. IV-/Il- Calmness 4+6=10 38 .83 
IV+/lil+ vs. IV-/lil- _ Moderation 44-6210 ‚24 76 
IV-/V« vs. IV-V- Toughness 4+8=12 .29 .B4 
IV+/l- vs. |У-Л+ Impulse Control 24-9211 .24 .78 
IV+/Il- vs. IV-/II4- Imperturbability 2-72 9 чу ‚84 
*IV+/Ill- vs. М-ЛИ+ — Cool-headedness 0 +10 = 10 .21 T3 
*IV+/V- уз. IV-/V+ Tranquility 7+4= 11 .22 76 
Factor V 
V+/V+ vs. V-/V- Intellect 6+5=11 27 ‚81 
V+/l+ vs. М-|- Ingenuity 6+3= 9 ‚37 84 
"V+/Il+ vs. М-/11- Reflection 8+2=10 .26 75 
*У+ЛЇЇ+ vs. М-/111- Competence 8+0= 8 ‚26 74 
V+/IV+ vs. V-/IV- Quickness 7+3=10 137 .84 
*М+Л- vs. М-Л+ Introspection 10+2=12 .18 T] 
№+/11- vs. У-ЛЇ+ Creativity 5450 .30 .81 
V+/IIl- vs. М-/111+ Imagination 5+5 = 10 27, ‚78 
*V+/IV- vs. V-/IV + Depth 7+2= 9 27 TI 
Mean .26 .78 


Note: All analyses are based on the responses of 501 adult subjects from the Eugene-Springfield 


Community Sample; These scales have been augmented with items from other АВ5С facets. 
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Specifically, we have developed both 50-item (10 items per domain) and 100-item 
(20 items per domain) scales to measure the five domains in each of the two models 
(Goldberg, 1997). 

Over 500 adult participants from a community sample completed both the 240- 
item NEO-PI-R and an initial set of 1.252 IPIP items (Goldberg, in press); the IPIP 
items were administered in three separate questionnaires over a three-year period of 
time, and the NEO inventory was administered on another occasion during the same 
time period. Each of the participants had previously completed an inventory of 360 
trait-descriptive adjectives which included Goldberg's (1992) 100 unipolar Big-Five 
factor markers. The five orthogonal factor scores from the 100 markers (based on 
ipsatized data) served as the criteria for the Big-Five constructs, and scores on the 
five 48-item domain scales from the NEO-PI-R served in that role for the Five- 
Factor (NEO) model. Responses to all of the IPIP items were first correlated with 
each of the criterion indices, and the items were then categorized by their highest 
correlations. Initial scales were developed using the most highly related items in 
each category, and if necessary these scales were then refined by internal consis- 
tency analyses (Criterion 5, maximizing Alpha). 

Table 2 presents some characteristics of the new IPIP scales for the Big-Five do- 
mains, including the number of positively and negatively keyed items in each scale, 
its mean item intercorrelation, its Coefficient Alpha reliability estimate, and its cor- 
relation with the orthogonal factor scores derived from the Big-Five adjective 
markers. On average, the shorter scales had a mean item intercorrelation of .34, an 
Alpha of .84, and a correlation of .67 with the factor markers (.81 when corrected for 
unreliability); the longer scales had a mean item intercorrelation of .31, an Alpha of 


Table 2. Characteristics of the Preliminary IPIP Scales Measuring the Big Five Domains 


Number Mean Кет Coefficient Correlation 
Big Five Domain of Items Intercorrelation Alpha with Markers 
Shorter Scales 
|. Extraversion 5+5=10 .40 .87 .73 [.84] 
11. Agreeableness 6+4=10 .31 .82 .54 [.66] 
Ill. Conscientiousness 6+4 = 10 .29 .79 .71 [.90] 
IV. Emot. Stability 2 +8 = 10 .38 .86 .72 [.84] 
V. Intellect 7+3=10 .34 .84 .67 [.80] 
Total/Mean 26 +24 = 50 ‚34 ‚84 ‚67 [.81] 
Longer Scales 
|. Extraversion 10 +10 = 20 .34 :91 .76 [.84] 
||. Agreeableness 14 + 6 = 20 ‚28 ‚88 .57 [.65] 
|||. Conscient. 1149-20 27 5 .88 .74 [.84] 
IV. Emot. Stability 4 +16 = 20 .35 .91 .74 [.81] 
V. Intellect 13+7=20 32 ‚90 ‚69 [.77] 
Total/Mean 52 +48 = 100 31 .90 .70 [.78] 


Note: Vatues in brackets are correlations corrected for unreliability; these may be underestimates, 
given that the reliabilities of the factor markers were assumed to be the same as those of their cor- 
responding IPIP scales. 
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.90, and correlated .70 with the markers (.78 when corrected). The items included in 
each of these new Big-Five scales are provided on the IPIP Website. 

Table 3 presents the corresponding values for the new IPIP scales measuring the 
constructs in the Five-Factor (NEO) model, including the correlations with the 48- 
item NEO domain scales. On average, the shorter scales had a mean item intercor- 
relation of .33, an Alpha of .82, a correlation with the NEO domains of .77 (.90 
when corrected); the longer scales had a mean item intercorrelation of .30, an Alpha 
of .89, and a mean correlation of .81 (again .90 when corrected). The items included 
in each of these new FFM scales are also provided at the IPIP Website. 

There are no common items among the scales within each scale set, although all 
of the items in the 10-item scales are included in their 20-item counterparts. The 
part-whole correlations between the shorter and the longer scales were: .95, .94, .95, 
.96, and .96 for the Big Five constructs, and .95, .92, .96, .95, and .96 for the Five- 
Factor (NEO) constructs, both sets in Big Five order. The average of the intercorre- 
lations among the scales based on the Big-Five constructs, presented in Table 4, 
were very slightly lower than for those based on the Five-Factor (NEO) constructs. 
When corrected for attenuation due to the scale unreliabilities, the across-set con- 
vergence was essentially perfect (r = 1.00) for the Extraversion, Conscientiousness, 
and Emotional Stability (Neuroticism) constructs; the corrected correlations for the 
Agreeableness scales were .79 (.84) and for Intellect/Openness they were .83 (.86). 


Table 3. Characteristics of the preliminary IPIP scales measuring the NEO domain constructs 


Number Mean Item Coefficient Correlation 
NEO Domain of Items Intercorrelation Alpha with NEO 
Shorter Scales 
l. Neuroticism 5+5=10 .37 .86 .82 [.92] 
ll. Extraversion 5+5=10 .38 .86 .77 [.88] 
||. Openness 5 +5 = 10 .33 .82 ‚79 [.91] 
IV. Agreeableness 5 +5 = 10 .27 #77 ‚70 [.85] 
V. Conscientiousness 5+5=10 ‚31 .81 .79 [.92] 
Total/Mean 25 +25 = 50 .33 .82 .77 [.90] 
Longer Scales 
1. Neuroticism 10 +10 = 20 33 91 ‚86 [.93] 
|. Extraversion 10 +10 = 20 35 ‚91 ‚79 [.88] 
|||. Openness 10 +10 = 20 .29 .89 .83 [.92] 
IV. Agreeableness 10 +10 = 20 23 ‚85 ‚78 [.90] 
V. Conscientiousness 10 +10 = 20 .31 .90 .80 [.88] 
Total/Mean 50 +50 = 100 .30 .89 .81 [.90] 


Note: Values in brackets are correlations corrected for unreliability. The Coefficient Alpha reti- 
ability values for the 48-item NEO domain scales were: N = .93; Е = .89; О = .91; А = .89; апа C = 
91. | 


Assessing the Big Five 43 


Table 4. Correlations among and between the preliminary IPIP scales measuring the domain con- 
structs from the Big Five and the Five Factor (NEO) Models 


VE МА ПИС IV/N V/O 
VE .93 (.96) .28 (.39) 497 (АРА) .18 (.27) ‚35 (.40) 
(А „15 (.22) „63 (.73) SITZ) 3219(:23) .17 (.18) 
ПИС .24 (.28) .22 (.21) .81 (.87) suey {eile}, .03 (.07) 
IV/N 7.35 (-.38) -.43 (-.41) -.36 (-.40) -.89(-.93) ‚13 (220) 
V/O .36 (.37) .11 (.09) ‚01 (.05 -.08 (-.09 .69 (.77) 


Note: Correlations among the IPIP Big Five domain scales are presented above the main diagonal, 
correlations among the IPIP Five Factor (NEO) domain scales are below the diagonal, and correla- 
tions between the corresponding scales in each set are listed in bold in the diagonal. (Correlations 
based on the 20-item scales are listed in parentheses after the values for the 10-item scales.) Fac- 
tor | = Extraversion; Factor || = Agreeableness; Factor || = Conscientiousness; Factor IV = Emo- 
tional Stability (versus Neuroticism); and Factor V - Intellect/Openness to Experience. 


Criterion 8: Scale brevity, or keeping it short and sweet 


In measurement, brevity imparts efficiency, and thus brevity is generally desirable 
(Burisch, 1984b). We noted the value of item brevity with respect to our first crite- 
rion, but Criterion 8 addresses scale brevity. For some research, teaching, and as- 
sessment purposes, even a 100-item inventory, such as the marker set from Goldberg 
(1992), is too lengthy. However, because any abbreviated measure almost inevitably 
suffers from a loss of reliability compared to the full measure, there is a recurring 
cost involved in the creation of a "short form" of a longer measure. To minimize this 
cost, one must attempt to conserve internal consistency while culling items. By do- 
ing so, however, one could easily precipitate a decline in validity even though Alpha 
is relatively constant, because the scale is being made overly narrow and homogene- 
ous (Loevinger, 1954). Smith, McCarthy, and Anderson (2000) discuss other poten- 
tial problems in short-form development, stressing that short forms (a) be developed 
only on well-validated measures, (b) preserve the content coverage and subfactors of 
the longer form, (c) protect reliability, (d) demonstrate overlapping variance with the 
longer form when administered independently, (e) show a factor structure similar to 
that of the longer form, (f) have demonstrated validity and high correct classification 
rates in independent samples, and (g) show meaningful savings in time or resources. 
What is the absolute minimum number of items that should constitute a scale? 
One item is certainly too few; internal consistency is not easily estimated and bal- 
anced keying is impossible. On the other hand, in unusual cases where the construct 
being measured is highly familiar (or "schematized") to respondents, unidimen- 
sional, and primarily subjective in content, one item could be adequate (Robins er al, 
2001). Although two-item scales have neither of the disadvantages of one-item 
scales, internal consistency tends to be purchased at the cost of extreme narrowness 
of breadth. With three-item scales, unbalanced keying is again a problem. Thus, 
four-item scales seem to be a practical minimum in most cases. Such mini-scales 
have been referred to as "testlets," “item parcels," "homogeneous item composites," 
“factored homogeneous item dimensions,” and the like; and they have been used as 
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the basic building blocks for longer scales by Comrey (1988), Hogan and Hogan 
(1995), and Saucier (2001; in press). 

It may be, however, that the fewer items one selects from a large item-pool, the 
greater is the likelihood that their selection will have capitalized on chance charac- 
teristics of the derivation sample, leading to decreased internal consistency in new 
samples. Moreover, a high Coefficient Alpha in a four-item scale is typically only 
possible if the content is highly focused and narrow, so marker sets commonly in- 
clude more than four items. And, internal consistency is not the only reason for in- 
cluding more items. Nunnally and Bernstein (1994, p. 16) suggest that data from 
single items are ordinal, but aggregates of these items are more readily treated at the 
interval level of measurement. A scale including 8 or 10 items is likely to generate 
scores with a more Gaussian distribution than would a scale consisting of only four 
items. 


Application: Saucier's Big Five Mini-Markers 


Saucier (1994) scrutinized the performance of each of Goldberg's (1992) 100 uni- 
polar markers in 12 data sets, searching for those items that loaded most highly on 
the expected factor in virtually all analyses. After selecting an initial set of eight 
items for each factor based on this "factor purity" criterion, revisions were made to 
(a) increase user-friendliness by reducing the number of negations beginning with 
the prefix "un-," (b) decrease the number of root-negation pairs (e.g., Kind-Unkind) 
so as to lessen any overnarrowing of content, and (c) increase the correlation of the 
brief scales with the original 100 unipolar marker scales. After 9 such item- 
substitutions, the final 40-item set included eight items for each factor. In the case of 
Factors І (Extraversion), II (Agreeableness), and III (Conscientiousness) there were 
four items for each pole of the factor. In the case of Factors IV (Emotional Stability) 
and V (Intellect/Imagination) a dearth of suitable terms led to the selection of six 
terms at one pole (low Emotional Stability, high Intellect) and two at the other pole. 

Internal consistency estimates for the five Mini-Marker scales were provided by 
Saucier (1994) in four data sets. The 20 Alpha coefficients ranged from .69 to .86, 
averaging around .80; these coefficients were generally about .07 lower than those 
for the longer 100 markers set. There are indications, however, that validity is com- 
parable with that for the longer marker set (Dwight, Cummings, & Glenar, 1998). 
As would be expected from scales with lower internal consistency, Saucier (1994) 
noted that the Mini-Markers had lower inter-scale correlations than did the 100 uni- 
polar markers from which they were derived. For example, the mean inter-scale cor- 
relations for the 100 markers which averaged .19 in raw data and .10 in ipsatized 
data were reduced in the Mini-Markers to .15 in raw data and .09 in ipsatized data. 
Are inter-scale correlations of this size acceptable? Could they be reduced further by 
purposeful scale-construction procedures? 
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Criterion 9: Mutual orthogonality among marker scales 


In his critique of the Big Five model, Block (1995) pointed out that although the 
model is based on orthogonal factors, the five factors are normally operationalized 
with scales that are at least somewhat interrelated. In self-ratings, Goldberg's (1992) 
unipolar markers have intercorrelations as high as .37, and in peer ratings as high as 
.58.* Indeed, Digman (1997) was able to develop second-order factors on the basis 
of the intercorrelations among the scales within various five-factor marker sets. 
Even in data sets where the average intercorrelation is low, the correlation between a 
pair of markers can be quite high, and one such high correlation alone is enough to 
call into question the assumption of five "orthogonal" factors. 

Orthogonal factors are not necessarily better than oblique factors. But orthogonal 
factors are an advantageous feature of the Big Five model for at least two reasons. 
First, when one is mapping a domain of variables, as when one is mapping a physi- 
cal landscape, orthogonal axes provide a superior coordinate system for locating 
points on the map. Second, as Jackson (1971) noted, “if one wishes to maximize the 
predictability of a battery, entirely uncorrelated tests would be appropriate" (p. 246). 
Orthogonal predictors are more efficient in multiple-regression analyses because 
they minimize multicollinearity and maximize discriminant validity. 

It has long been known that marker scales based on orthogonal factors are not 
necessarily themselves mutually orthogonal (e.g., Cattell & Tsujioka, 1964). Recog- 
nition of non-orthogonality in marker scales has prompted some statistical remedies, 
such as (a) the ipsatization of the original response data, which tends to lower scale 
intercorrelations, and (b) the use of orthogonal (e.g., varimax) factor scores 
(Goldberg, 1992). Ipsatizing within sets of items that do not have balanced keying 
with respect to content can lead to inadvertently discarding content variance. 
Moreover, the most common form of ipsatizing, the use of standard (Z) scores, con- 
trols for between-subject differences in spread (variance) as well as central tendency 
(mean); while this practice has been explicitly recommended by Goldberg (1990; 
1992), it has recently been criticized by Hofstee ег al. (1998). 

The most direct method for assuring orthogonality is to use orthogonal factor 
scores instead of scale scores. One limitation of this procedure is that the factors de- 
rived de novo on each occasion are less uniform across samples than are scale 
scores. Perhaps as a consequence, most users of Big Five markers use simple (but 
correlated) scale scores based on raw data, eschewing both types of statistical reme- 
dies. Accordingly, a close approximation to orthogonality would be a desirable fea- 
ture in a Big Five marker set. 


? Nor are high inter-scale correlations confined to lexical studies or adjective stimuli. The Revised 
NEO Personality Inventory (NEO-PI-R) has domain-scale intercorrelations as high as -.53 in self- 
ratings (Costa & McCrae, 1992). 
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Table 5. The orthogonal subset of the 100 unipolar markers (Ortho-40): Reliabilities and interscale 
correlations 


Derivation Samples . Cross-Validation Sample — 
Self Liked Peer Pooled Peer Community Sample 

Coefficient Alpha 

| ‚84 ‚86 . .86 ‚81 

| .73 ‚79 `.92 ‚71 

In .86 .87 .89 .85 

IV .70 .62 .70 71 

V Fa 32 .83 .74 
Interscale Correlations 

1-11 .03 .02 -.10 .03 

І-Ш -.07 -.12 -.08 | ‚04 

I-IV .00 .02 -.17 ‚05 

l-V .03 .06 -.13 ‚10 

ЈЕ zur sil .19 .12 

I-IV -.05 .03 .30 ‚19 

II-V .04 221 .40 .06 

ЕУ -.03 ‚09 .14 .13 

H-V .08 .06 .28 .06 

IV-V -.06 -.04 .10 -.03 
Mean Correlation 

Ortho-40 .01 .05 .09 .07 

100 Markers M3 .24 127 25 

40 Mini-Markers eui .18 .26 3:22; 
Note: Sample sizes: Self = 320; Liked Peer = 316; Pooled Peer = 205; Community Sample = 1,125. 
All analyses used the original (non-ipsatized) response data. The 40 Ortho items: | = Bold, Extra- 
verted, Talkative, Unrestrained vs. Introverted, Quiet, Reserved, Shy; I! = Kind, Sympathetic, Un- 
demanding, Warm vs. Cold, Demanding, Harsh, Unsympathetic; Ill = Efficient, Neat, Organized, Sys- 


tematic vs. Careless, Disorganized, Sloppy, Unsystematic; iV = Unenvious, Unexcitable vs. Anxious, 
Emotional, Fearful, Fretful, Nervous, Touchy; V = Artistic, Complex, Creative, Deep, Introspective, 
Philosophical vs. Simple, Unreflective. 


The application of Criterion 7, emphasizing low divergent loadings, tends to sup- 
press inter-scale associations but not necessarily to remove them. If most of the po- 
tential marker items for a factor are associated in the same direction with another 
factor, simply choosing those items with the lowest divergent loadings will not serve 
to guarantee unrelated marker sets. To remove the inter-scale correlations, one must 
select marker items whose correlations with each of the other factors are balanced 
with respect to sign. Then, because scale scores that are uncorrelated in a derivation 
sample may not be uncorrelated in a new sample, one must demonstrate that the ap- 
proximation to orthogonality persists when the scale scores are intercorrelated in a 
new sample. 


Application: The "Ortho-40" markers 


Table 5 provides an illustration of the results of using orthogonality as a criterion for 
item selection. The 100 items in Goldberg's set of unipolar markers were scrutinized 
in the self- and peer-rating data sets used in Goldberg's (1992) Study 4. Items that 
contributed most to the positive scale intercorrelations were removed until eight 


Assessing the Big Five 47 


items remained per scale (with some priority given to maintaining balanced keying). 
This 40-item subset is labeled the Ortho-40. Coefficient Alpha reliability averages 
about .10 lower than for the 100-marker scales, and about .03 lower than for the 
Mini-Marker subset (also based on 40 items). But inter-scale correlations are dra- 
matically lower than for either of the other two sets, on average about .15 lower per 
pair of scales. The highest inter-scale correlations are in the Pooled Peer sample, 
where the general evaluation factor has a powerful effect on these coefficients; in 
this extreme case, whereas one correlation in the 100 Markers reached .58 (Factors 
II and IV), the highest correlation in the Ortho-40 was .40 (Factors II and V). Over- 
all, the Ortho-40 sacrifices some internal consistency in order to gain greater mutual 
orthogonality. The Ortho-40 subset demonstrates that the Big Five are not oblique 
by necessity; if one has a sufficiently large item pool, it should be possible to de- 
velop a set of marker scales that are virtually unrelated. 

Another illustration of the application of Criterion 9 is provided by Saucier's 
(2000a) new Modular Markers, which have inter-scale correlations that are compa- 
rable to those of the Ortho-40, but with higher reliabilities. However, these Modular 
Markers were developed using an additional criterion, which must first be intro- 
duced. 


Considerations congruent with newer forms of measurement 
theory 


None of the criteria offered so far are inconsistent with classical test theory 
(McDonald, 1999). However, from the standpoint of item response theory (e.g., 
Embretson & Reise, 2000) these criteria, which tend toward maximizing Alpha and 
homogenizing item difficulties, could lead to scales with a tendency to “parallel- 
ism"? Strictly parallel items have the same difficulty levels (e.g., mean response) 
and discrimination (e.g., item-total correlation) parameters; redundant items thus 
tend to be parallel. A set of relatively redundant items will have a high degree of in- 
ternal consistency. But a set of such items is problematic because it is likely to be 
overly narrow, which may decrease validity (see Criterion 4 regarding bandwidth). 
And it may distinguish well among individuals at one level of the broader construct 
but not at other levels. For example, a marker scale for Extraversion formed from 
the three items “Talks too much", “Сап” stop talking", and “Chatters away even if 
no one is listening" might effectively distinguish extreme extraverts from both mod- 
erate extraverts and introverts, but would probably do a poor job of distinguishing 
between the latter two groups. Nonetheless, this set of items should exhibit substan- 
tial internal consistency. A peaked, or kurtotic, test maximizes reliability (Lord, 
1952), and can be expected to show high levels of consistency across samples in ex- 
ploratory factor analyses. 


10 Criterion 4, stressing broad bandwidth, is the most likely exception, since a measure of a broad 
attribute is unlikely to result from a set of redundant items. 
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Our final criterion is derived from aspects of item response theory (IRT), which 
has been widely applied to measures of ability and aptitude but has not yet had a 
major impact on personality measurement. Unfortunately, IRT analyses require 
relatively large samples and seem better suited to relatively specific, homogeneous 
content than the heterogeneous constructs of the sort personality psychologists have 
emphasized (Nunnally & Bernstein, 1994, p. 434). However, without adopting a 
full-scale IRT approach, one can still borrow at least one important IRT scale- 
construction criterion." 


Criterion 10: Equidiscrimination (discriminating at diverse levels) 


A contribution of IRT is its emphasis upon selecting items with a spread of difficulty 
levels in order to discriminate among (i.e., effectively differentiate) individuals at a 
variety of levels of the attribute. If one wanted to measure individual differences in 
the ability to solve arithmetic problems, one would not restrict one's questions to a 
single level of difficulty (e.g., only addition of single-digit integers, or alternatively 
only multiplication of twelve-digit numbers). Instead, one would include items cov- 
ering a range of difficulty levels to allow the measure to discriminate very high abil- 
ity from moderate ability, and very low ability from mere mediocrity. Tests that in- 
clude a wide range of item difficulty levels provide more information, and thus have 
broader bandwidth. 

In personality measurement, item difficulty levels index the "difficulty" respon- 
dents are likely to have in admitting to, ascribing, or agreeing with the content of the 
item, as indicated by inter-item variations in the response means. Items that are easy 
to endorse will tend to discriminate well only between those who are very low and 
moderately low on the attribute, whereas items that are difficult to endorse will tend 
to discriminate well only between those who are very high and moderately high on 
the attribute. Items with more intermediate response-means are prone to discriminate 
well in the middle of the attribute distribution, but not at either extreme. In most 
cases, classical item-selection procedures lead to a bias toward selecting items of 
intermediate difficulty (Nunnally & Bernstein, 1994, p. 329). 

One would expect that any item selection procedure that works to diversify the 
content of the selected items will work against parallelism, and in favor of discrimi- 
nation at diverse levels of the attribute. Thus, five-factor measures like the NEO-PI- 
R domain scales (Costa & McCrae, 1992) or Johnson's (2000) IPIP-NEO short- 
form, which build up the score for a factor from subscales with diverse content, are 


w Minimizing differential item functioning (DIF; also sometimes referred to as item bias) is another 
scale-construction criterion prominent in IRT deserving of more attention and study with respect to 
personality measurement. DIF exists whenever two items differ between groups in their parameters 
(e.g., discrimination, difficulty level). As Nunnally and Bernstein (1994) advise, "one should choose 
items whose parameters are most similar across groups, whether these parameters are defined 
classically or through IRT. This is especially true when the groups differ in gender or ethnicity" (p. 
417). One might divide one's data into subsamples based on gender or ethnicity and (a) eliminate 
items with relatively poor discrimination in any subsample, or (b) retain those items that show the 
smallest differences in parameters between subsamples. 
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unlikely to be characterized by parallelism. However, the relative discriminatory 
power of these measures at differing levels of the attribute remains to be demon- 
strated. 

A measure that discriminates well across levels of the latent attribute is most 
needed when important practical decisions are made about people based on their 
Scores on à measure, and thus highly reliable distinctions at all levels of the attribute 
are necessary; this criterion would be particularly valuable for any measure that is 
used in a wide variety of selection situations. To the extent that factor markers are 
used only for locating other variables, rather than locating individuals, such a crite- 
rion may be less necessary. Nonetheless, because markers scales that were originally 
developed for the purpose of locating variables (e.g., Goldberg, 1992) have then be- 
come widely used as measures of individual differences, it may be sensible to incor- 
porate this criterion into marker construction from the onset. 

To develop measures that discriminate well at various levels of the latent attrib- 
ute, Nunnally (1967) proposed a simple item-selection procedure for what he called 
the equidiscriminating (EQD) test. An EQD measure can be constructed by selecting 
items based on their characteristics at multiple cutoff levels. On the basis of the fre- 
quency distribution of the underlying attribute (e.g., the factor scores for a broad 
factor), one selects cutpoints between fractions of the distribution. For example, one 
can select one-third of the items to differentiate the top 25 per cent of the sample 
from the bottom 75 per cent, another third to differentiate the top half of the sample 
from the lower half, and a final third to discriminate the bottom 25 per cent of the 
sample from the top 75 per cent (Nunnally & Bernstein, 1994, p. 330). Then, one 
selects some items that served best to differentiate individuals above each cutpoint 
from those below it. There are other ways to reach the same equidiscriminating end 
result, of course, including procedures specific to IRT, but Nunnally's procedure 
may be the simplest to implement. 


Application: Modular markers 


Saucier (in press) created a new set of marker scales for the Big Five, as well as 
scales for broader structures of one and two factors based on studies of natural- 
language descriptors. The label “Modular Markers” for these scales is based on the 
flexible use of item parcels serving in marker sets for the development of scales at 
more than one hierarchical level. These new scales were constructed so as to simul- 
taneously achieve three major objectives — relative orthogonality (Criterion 9), 
higher internal consistency (Criterion 5) than was obtained with the Ortho-40, and 
improved equidiscrimination (Criterion 10) than previous marker sets. 

The initial item pool consisted of 100 representative parcels, plus 21 supplemen- 
tary item parcels, also of two to four adjectives (Saucier, in press). For each of the 
Big Five factors, the distribution of factor scores based on analyses of personality- 
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Table 6. The factor loadings of the 32 parcels in the set of Modular Markers 


1 [Ш IV- У 
Kindness .77* .26 .03 .00 -.22 
Warmth 123° .20 .30 .08 207) 
Sympathy :735 221 ‚11 .18 .30 
Agreeableness .65* .08 . 708 -.06 a 
Sensitivity .60* .16 709 .43 : 
Toughness -.64* ‚04 ‚09 e .06 
уп -.55* -.09 2115, .09 .16 
Criticalness -.47* .05 .05 .25 217 
Demandingness -.47* .19 .22 Bi ‚11 
Efficiency uH .79* .03 -.11 .10 
Organization .07 7 -.04 -.05 -.01 
Perfectionism -.04 FAH -.07 11 ‚19 
Decisiveness .00 1555 ‚21 -.31 ‚15 
Caution .21 .50* -.31 .19 .10 
Ambition .08 .39* :32 .03 .18 
Forgetfulness -.02 2:975 -.01 .26 .01 
Talkativeness -.06 -.08 .70* .11 -.13 
Sociability .35 .20 .66* -.05 -.07 
Assertiveness -.37 227 .62* -.03 .20 
Spontaneity .18 -.16 2517 .26 „31 
Adventurousness -.03 -.05 .47* -.03 .34 
Restraint 310 .15 -.71* ‚10 ‚07 
Shyness 11 -.08 -.66* .23 3l 
Fretfulness -.16 -.20 -.22 65: -.08 
Anxiety -.20 -.12 -.01 .63* -.02 
Emotional Excitability 18 -.06 .39 2995 .08 
Jealousy/Envy -.34 -.20 .02 :95* -.11 
Hyperdevotedness 213 512 -.14 .48* .09 
Analytical Inquiry .01 NS -.02 .05 .81* 
Reflectiveness .20 S -.16 .07 .65* 
Intellectuality aui. .32 .09 -.12 52 
Unconventionalit -.24 -.41 :21 -.03 o At 
Note: N = 1,620. Coefficients are varimax-rotated factor loadings; | = Extraversion (Dynamism); || 
= Agreeableness (Altruism vs. Antagonism); || = Conscientiousness (Self-Regulation); IV = Emotional 


Stability (reflected: Anxiety); V = (Autonomous) Intellect;* = Highest loading for each variable. 


descriptive adjectives in 14 data sets (Saucier, 2001) was dichotomized around cut- 
points at the 16.67, 33.33, 50, 66.67, and 83.33 percentiles of the distribution." For 
each factor, each of the 121 candidate parcels was correlated with each dichotomy. 

The three highest-correlating parcels for each dichotomy were retained as part of 
the initial version of the marker scale. This initial version was revised so as to fur- 
ther reduce scale intercorrelations and also to better maximize correlations with the 
criterion factor scores. 


7 The 33.33 and 66.67 cutpoints did not have incremental usefulness beyond the other three 
cutpoints, and thus it was not necessary to use them in this instance. 
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Table 7. The Modular Markers: Reliabilities and Interscale Correlations 


Derivation Samples Cross-Validation Sample 
Self Liked Peer Pooled Peer Community Sample 
Coefficient Alpha 
| .88 .89 191 ‚84 
|| .82 .86 .94 .83 
Ш .85 .88 .91 .86 
IV .79 .75 .80 ‚82 
V 717 S75 .87 .82 
Interscale Correlations 
|-|| -.11 -.07 -.02 -.01 
I 11 ‚04 -.03 222 
I-IV -.06 -.05 -.10 ‚08 
l-V .13 .26 ou .20 
ШИП .01 12 -.01 ‚05 
I-IV -.09 .04 .28 .28 
П-у .02 .00 .21 -.21 
ЕУ .01 213 .09 .23 
II-V .04 .02 E23 .03 
IV-V .00 .00 .18 -.02 
Mean Correlation 
Modular Markers .01 .05 .10 .08 
100 Markers .13 .24 727 .25 
40 Міпі-Магкегѕ .11 .18 .26 227 
Ortho-40 mean .01 .05 .09 .07 


Note: Sample sizes: Self = 320; Liked Peer = 316; Pooled Peer = 205; Community Sample = 592; All 
analyses used the original (non-ipsatized) response data. 


The end result was the set of parcels presented in the Appendix: 7 parcels (20 
items) for Extraversion, 9 parcels (22 items) for Agreeableness, 7 parcels (18 items) 
for Conscientiousness, 5 parcels (16 items) for Emotional Stability, and 4 parcels 
(14 items) for Intellect. Factor analyses of the 32 parcels (from 90 adjectives) 
making up the Big Five marker set indicated that the parcels reproduced the desired 
factors quite faithfully with either varimax or quartimax rotated solutions. The vari- 
max solution for a combined sample of 1,620 ratings is presented in Table 6. 

Table 7 provides Big Five Modular Marker scale intercorrelations, using original 
(non-ipsatized) responses in five samples; the comparable values for Goldberg's 
(1992) 100 Markers are also provided. The 100 Markers have roughly the same level 
of average inter-scale correlations as do most previous Big Five marker sets (e.g., 
Benet-Martinez & John, 1998; Costa & McCrae, 1992) — about .20. In contrast, the 
Modular Markers have an average inter-scale correlation of only about .05, similar 
to that of the Ortho-40 set presented earlier. The highest single inter-scale correla- 
tion found in any sample was only .28 (compared to .40 for the Ortho-40). However, 
the Alpha reliability coefficients of the Modular Marker scales are higher than those 
for the Ortho-40 by about .05 on average, as one would expect given their greater 
length (90 items instead of 40). These comparisons suggest that the Modular 
Markers may be slightly superior to the Ortho-40 as a set of mutually orthogonal 
marker scales. 

The Modular Markers, with 90 adjectives, are of roughly comparable length to 
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Table 8. The Mini-Modular markers (3M40): Reliabilities and interscale correlations 


Derivation Samples . Cross-Validation Sample _ 
Self Liked Peer Pooled Peer Community Sample 
Coefficient Alpha 
| 1 .82 .84 .85 .77 
|| ‚71 .76 .89 anh 
|| .76 .75 "^ ,84 .76 
IV .67 .63 offi 172 
ү .67 .64 .80 73 
Interscale Correlations 
|-|| .02 .05 .01 .09 
.03 -.04 -.06 .19 
I-IV -.05 -.09 -.17 .06 
l-V .09 445 .07 .14 
ЈЕ .01 .08 -.04 .10 
ЈЕ -.05 .04 226 .24 
187 .00 .10 .24 -.10 
HI-IV .03 .09 .06 .18 
І-м -.02 -.02 .08 -.04 
IV-V -.02 .06 17 .08 
Mean Correlation 
3M40 .01 .04 .06 .10 
Ortho-40 .01 :05 .09 .07 
100 Markers .13 .24 v "25 
40 Mini-Markers .11 .18 .26 522 


Note: Sample sizes: Self = 320; Liked Peer = 316; Pooled Peer = 205; Community Sample = 592 for 
the 3M40 scales and 1,125 for the other marker sets. All analyses used the original (non-ipsatized) 
response data. The 3M40 items: | = Assertive, Playful, Sociable, Talkative vs. Quiet, Reserved, Shy, 
Withdrawn; | = Kind, Sentimental, Sympathetic, Tolerant vs. Cold, Critical, Demanding, Harsh; || = 
Cautious, Efficient, Meticulous, Organized, Perfectionistic vs. Absent-minded, Disorganized, Indeci- 
sive; IV = Unenvious, Unexcitable vs. Anxious, Emotional, Fearful, Fretful, High-strung, Nervous; V = 
Complex, Intellectual, Nonconforming, Philosophical, Unconventional vs. Conventional, Unintellec- 
tual, Unreflective. 


the 100 Markers, but they were developed using different criteria, reflecting dif- 
fering priorities. The 100 Markers were constructed with an emphasis on Criteria 5 
through 7 (Alpha maximization, Factor saturation, and Discrimination), and as 
would be expected their reliabilities are slightly higher than for their Modular 
Markers counterparts, which were developed with more emphasis on Criteria 9 and 
10. However, the use of a representative set of item parcels at the first stage of scale 
construction gives the Modular Markers some kinship to Goldberg’s (1990) 133 and 
100 clusters which we described earlier, with more emphasis on Criterion 4 than was 
true for the 100 Markers. Many of the parcels in the Modular Markers have balanced 
keying, which was true for none of Goldberg’s (1990) clusters. 

What if one were to apply the brevity criterion (Criterion 8) to the Modular 
Markers, and seek an abbreviated set? Table 8 provides internal consistency esti- 
mates and inter-scale correlations for a set of 40 Mini-Modular-Markers (3М40). 
This reduced set of adjectives was developed by selecting from the 90 Modular 
Markers a subset of items that (a) retained the highest-loading items with (b) about 
equal numbers having positive and negative loadings on each of the other factors, (c) 
while maintaining a spread of response means on each scale, with some secondary 
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attention also to (d) maintaining balanced keying, (e) representing as many of the 32 
parcels as feasible, and (f) excluding items where doing so increased the internal 
consistency of the scale. Bases (a) through (f) correspond to our Criteria 6, 9, 10, 2, 
4, and 5, respectively. 

Compared to the full set of 90 Modular Marker items, inter-scale correlations for 
the 3М40 are about the same on average. But internal consistency is lower (almost 
.10 per scale on average) than for the longer marker set. Compared to the Ortho-40 
described earlier, the inter-scale correlations are similar, but the Alpha coefficients 
for the 3M40 scales are slightly lower (generally by less than .05). The lower inter- 
nal consistency is due to the higher degree of representative sampling in the 3M40 
scales. Although the two marker sets have nearly identical items for Emotional Sta- 
bility, on the other factors the 3M40 scales appear to be broader in content reference, 
primarily because the item pool in the Modular Markers has more breadth than that 
found in the 100 Markers on which the Ortho-40 was based. For example, the 
3M4O's scale for Intellect has "unconventionality" content that is lacking in the Or- 
tho-40 version (as well as the 100 Markers). Representative sampling does not 
maximize Coefficient Alpha, although it may heighten validity with respect to a 
broad array of criteria. Indeed, Saucier (in press) reported that the Ortho-40, Modu- 
lar Markers, and 3M40 Big Five marker sets demonstrated validities as high as the 
100 unipolar markers of Goldberg (1992) and the NEO-FFI (Costa & McCrae, 
1992) even though their Alpha coefficients were generally lower. 


Integrating diverse psychometric criteria for item selection 


Scale construction can serve any of many possible masters, but these masters can 
lead us in divergent directions. One might attempt to create marker scales based on 
all of the 10 criteria we have discussed, without realizing the extent to which some 
of these criteria are in conflict with each other. For example, maximizing the Coeffi- 
cient Alpha of a scale can be done at the expense of (a) maximizing the spread of 
item difficulties and (b) brevity. If such Alpha-maximization involves narrowing the 
content of the scale, validity could be attenuated over what it might otherwise be. If 
one seeks a representative sampling of variables in one's marker set, one is unlikely 
to achieve relatively orthogonal markers for orthogonal factors; and likewise, if one 
achieves orthogonal markers, it is probably at the expense of representative sam- 
pling. Uniform sampling, such as that used in the development of circumplex scales 
(e.g., Saucier, Ostendorf, & Peabody, 2001; Wiggins, 1980), will also tend to con- 
flict with representativeness, not to mention brevity. 

Thus, in most cases it will not be practical to apply all the criteria we have de- 
scribed to the construction of a single scale. We suggest that, instead, the criteria be 
integrated into a measurement paradigm in which each of the criteria is applied 
somewhere, but not necessarily everywhere. For example, one might build an initial 
set based on representative sampling of the domain, then select markers as a subset 
of this representative sample. One might utilize Alpha-maximizing approaches in 
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creating item parcels but temper this standard with the "discrimination at diverse 
levels" criterion in aggregating the parcels into measures of broader attributes, 
whose mutual orthogonality could be systematically maximized if factor- 
orthogonality is important. Although these procedures are more complex, the si- 
multaneous consideration of diverse psychometric goals should lead to higher qual- 
ity measures than might otherwise be achieved. 


Recommendations 


We have presented a variety of English-language marker sets targeted at the Big 
Five. These marker sets differ with respect to the original item pool as well as the 
criteria used in constructing them. Which is the best marker set? With respect to 
predictive validity, Saucier (in press) compared all of the adjectival marker sets we 
have presented (except the 100 clusters) and found no meaningful differences; sur- 
prisingly, the 40-item marker sets appeared to have validities equivalent to those 
with more than twice as many items. The 100 unipolar markers and the Mini- 
Markers are especially geared toward factorial replicability — generating an in- 
tended structure in exploratory factor analysis of the constituent items. Due to their 
length, the 100 markers, and then the Modular Markers, typically have the highest 
Alpha coefficients, and thus would provide the most precise differentiation of indi- 
viduals. On the other hand, the Mini-Markers, Опћо-40, and 3M40 all require less 
than half as much subject time as these more reliable marker sets. Finally, if one 
wishes to have more mutually orthogonal scale scores, one would choose the 
Modular Markers, Опћо-40, or 3M40. 

From another perspective, the 100 unipolar markers combine high Alpha coeffi- 
cients with factorial replicability. The Mini-Markers combine factorial replicability 
with brevity. The Modular Markers have more breadth and relative mutual or- 
thogonality, although Alphas are not quite as high as those for the 100 Markers. 
Both the Ortho-40 and 3M40 combine brevity and mutual orthogonality; based on 
the way that the item pool from which each was derived, the Ortho-40 is likely to 
have more factorial replicability and the 3M40 more breadth. 

These all appear to be good marker sets, but there is no single "best" one. Instead, 
what is best depends on how the investigator weights and values the various scale- 
development criteria. This is consonant with the overarching theme in our chapter: 
Trade-offs arise in the scale construction process that usually prevent one from 
generating a single perfect scale for a construct. We do tend to favor, however, 
scales that were developed taking a larger number of important criteria into account 
(such as the Modular Markers and its short form, the 3M40), on the grounds that 
these scales are less likely to have an "Achilles heel"; they are more balanced with 
respect to their virtues. Similarly, we encourage other investigators to take a broader 
view of scale construction, and to integrate a diverse range of useful criteria into the 
scale-construction process. 
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Appendix. The 32 parcels included in the Big Five modular markers 


Spontaneity: 
Talkativeness: 
Sociability: 
Assertiveness: 
Adventurousness: 
Shyness: 
Restraint: 


Warmth: 
Sympathy: 
Sensitivity: 
Kindness: 
Agreeableness: 
Toughness: 
Criticalness: 
Demandingness: 
Slyness: 


Organization: 
Caution: 
Ambition: 
Decisiveness: 
Efficiency: 
Perfectionism: 
Forgetfulness: 


Jealousy/Envy: 


Emotional Excitability: 


Anxiety: 
Fretfulness: 
Hyperdevotedness: 


Intellectuality: 
Analytical Inquiry: 
Reflectiveness: 
Unconventionality: 


Impulsive, Spontaneous, Playful (.65, .59) 
Talkative, (-) Quiet (.61,.77) 

Sociable, (-) Unsociable, (-) Withdrawn (.73,, .78) 
Dominant, Assertive, Forceful,(-) Timid (.68,, .75) 
Daring, Adventurous, (-) Unadventurous (.77,, .78) 
Shy, Bashful (.83, .81) 

Inhibited, Reserved, Restrained (.52., .62) 


Warm, (-) Cold (.64, .73) 

Sympathetic, Compassionate (.75,.78) 
Sensitive, Sentimental (.48, .66) 

(-) Cruel, Kind (.56, .66) 

Agreeable, Tolerant, Lenient (.52,, .65) 
Rough, Tough, Stern, Harsh (.58, .73) 
Critical, (-) Uncritical (.44,, .63) 
Demanding, (-) Undemanding (.53,, .70) 
Sly, Cunning, Shrewd (.59,, .69) 


Organized, (-) Disorganized (.80, .82) 

Careful, Cautious (.70,.77) 

Ambitious, (-) Unambitious (.78, .71) 

Decisive, (-) Indecisive (.58,, .66) 

Efficient, (-) Inefficient, (-) Careless (.70, .69) 
Perfectionistic, Exacting, Meticulous, Precise (.74,, .75) 
Forgetful, Absent-minded, Scatterbrained (.73ь, .76) 


Jealous, Possessive, Envious, (-) Unenvious (.67, .76) 
Excitable, Emotional, (-) Unexcitable (.63,, .72) 
Anxious, Nervous, High-strung (.73,, .63) 

Fretful, Fearful (.43,, .54) 

Overloyal, Overprotective, Overconscientious, 
Oversentimental (.70,, .614) 


Intellectual, (-) Unintellectual (.70, .71) 

Philosophical, Deep, Complex, Analytical (.67,, .70) 
Introspective, Contemplative, (-) Unreflective (.57,, .73) 
(-)Traditional, (-) Conventional, Unconventional, 
Nonconforming, Rebellious (.76, .74) 


Note: ESPS = Eugene-Springfield Community Sample combined with college peer-rating 
sample, N = 901; ABCD - Combined college-student samples A. B, C, and D, N = 1,028. 
Coefficients in parentheses are, respectively, coefficient alpha in ESPS and ABCD: sub- 
script letters indicate sample size for all items in parcel, a - N = 694,b- № = 596, с - N = 


592, d – N = 841, е ~ N = 823. 
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Validity and utility of the Revised NEO 
Personality Inventory: Examples from Europe’ 
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Friórik H. Jónsson 


Introduction 


The Revised NEO Personality Inventory (NEO-PI-R; Costa & McCrae, 1985, 
1992b) was the first published questionnaire designed specifically to assess the Five- 
Factor Model (FFM) of personality. In the late 1970s, our research using a variety of 
questionnaire measures had led us to the conclusion that many traits could be orga- 
nized in terms of three factors: Neuroticism (N), Extraversion (E), and Openness to 
Experience (O; Costa & McCrae, 1980). At the same time, Goldberg's (1981) lexi- 
cal studies suggested that five factors were needed to account for traits named in the 
English language. Research comparing his structure with ours (McCrae & Costa, 
1985) convinced us that we needed to add Agreeableness (A) and Conscientiousness 
(C) factors to our model, and we developed scales to measure them. 

The resulting instrument has become the most widely used measure of the FFM, 
or “Big Five," as Goldberg (1993) and others working in the lexical tradition usually 
call this model. The NEO-PI-R differs from most adjective-based operationalizati- 
ons of the FFM chiefly in two respects. First, Factor V in the lexical tradition is usu- 
ally called /ntellect, and emphasizes self-reported cognitive abilities; the correspon- 
ding NEO-PI-R factor is called Openness to Experience, and covers a broader range 
of constructs (McCrae, 1994). Second, most adjective measures assess only the five 
broad factors, whereas the NEO-PI-R was designed from its inception as a hierarchi- 
cal instrument (Costa & McCrae, 1995). Six specific traits, or facets, were selected 


! Portions of this chapter were presented at "Cultural Diversity and European Integration", the first 
Joint European Conference of the International Association for Cross-Cultural Psychology and the 
International Test Commission, Graz, Austria, June 28 - July 2, 1999. 
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Table 1. Characteristics of the Revised NEO Personality Inventory (NEO-PI-R) 


Scales: Five domains (Neuroticism, Extraversion, Openness to Experience, Agreeableness, Consci- 
entiousness) and thirty facet scales, six per factor. Factors can be scored using factor weights 
(Costa & McCrae, 1992b, Table 2). 


Description: 240-item (30-40 min.) questionnaire developed through rational and factor analytic 
methods. Responses range from Strongly Disagree to Strongly Agree, and scales are balanced to 
control for acquiescence. Form S is used for self-reports; Form R for observer ratings. Both paper- 
and-pencil and computer versions are available; the latter offers an interpretive report. A 60-item 
(10 min.) short form, the NEO Five-Factor Inventory (NEO-FFI), assesses only the five domains. 
Your NEO Summary provides feedback to respondents. 


Appropriate Populations: The NEO-PI-R was developed for use by college students and adults. It 
has been used extensively in both normal and clinical populations, for research and in clinical and 
industrial/organizational applications. Recent research suggests it can be used in adolescents as 
young as 12 (De Fruyt, Mervielde, Hoekstra, & Rolland, 2000), although modification of some 
items would be advisable. Form R has been used to rate individuals who are incapacitated or 
deceased. 


Reliability: Internal consistency estimates for the domains range from .86 to .93; for the facets, 
from .56 to .87; correlations between NEO-FFI and NEO-PI-R domains range from .77 to .92 (Costa 
& McCrae, 1992b). Two-year retest reliabilities range from .83 to .91 for domains and from .64 to 
.86 for facets (McCrae, Yik, Trapnell, Bond, & Paulhus, 1998). 


Validity: The NEO-PI-R has been used in over a thousand published studies and has demonstrated 
longitudinal stability, predictive utility, and consensual validation. Self/spouse and self/peer 
correlations range from .34 to .73 (Costa & McCrae, 1992b). NEO-PI-R factors have been related to 
most alternative measures of the Five-Factor Model, and facet scales have shown specific validity 
net of the five factors (McCrae & Costa, 1992). 


Cross-Cultural Generalizability: In addition to the European versions reviewed here, the NEO-PI-R 
or NEO-FFI has been translated into Hebrew, Arabic, Persian, Marathi, Telegu, Thai, Malay, Indo- 
nesian, Filipino, Chinese, Japanese, Korean, Shona, Xhosa, and Southern Sotho. Analyses of 
translations have provided evidence of generalizability (McCrae, 2001). 


Location: The NEO-PI-R is available from Psychological Assessment Resources, P. O. Box 998, 
Odessa, FL 33556., U.S.A., and on the web at www.parinc.com 


to represent each factor — for example, Trust, Straightforwardness, Altruism, Com- 
pliance, Modesty, and Tender-Mindedness are the facets of A. Facets were selected 
on the basis of a review of the personality literature, and these distinctions have pro- 
ven useful in a variety of contexts (e.g., Jang, McCrae, Angleitner, Riemann, & 
Livesley, 1998). 

In research conducted chiefly in North America, the NEO-PI-R has shown lon- 
gitudinal stability, heritability, and consensual validation (Costa & McCrae, 19922). 
А substantial body of studies supports the hypothesis that the FFM covers the full 
range of personality traits (McCrae, 1989): Although some specific traits, such as 
physical attractiveness, may lie beyond the scope of the FFM, there is as yet no con- 
sensus on any additional major factors (but see Cheung & Leung, 1998; Piedmont, 
2000; and Waller, 1999 for some candidate factors). Details on the development of 
the instrument and evidence of its construct validity are provided elsewhere (Costa 
& McCrae, 1995, in press; Costa, McCrae & Dye, 1991; McCrae & Costa, 1983; 
Piedmont, 1998). An overview is provided in Table 1. 

Beginning in the 1990s, researchers around the world began to translate and adapt 
the NEO-PI-R (or its short form, the NEO Five-Factor Inventory or NEO-FFI). Re- 
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search using these translations showed that the FFM structure is universal (McCrae 
& Costa, 1997), and that similar age (McCrae ег al., 1999) and gender (Costa, 
Terracciano, & McCrae, 2001) differences are found across a wide variety of cultu- 
res, including those of Africa and Asia. In this article we will review some of the 
studies conducted using European versions of the NEO-PI-R and consider some of 
the practical and theoretical implications of this research. European translations are 
of particular interest because Europe is a center of personality research, and the vari- 
ety of languages and cultures provides a good test of the generalizability of the in- 
strument. 


А new personality inventory in the Old World 


According to the Sagas, the Old and New Worlds were first bridged by an Icelander 
named Bjarni Herjolfsson around the year 986 A.D., when his ship overshot Green- 
land and drifted to within sight of new shores. He did not land, but passed on his tale 
(and his ship) to the better-known Leif Ericsson (Brøndsted, 1965). In this article we 
describe a different bridge between the two Worlds: Just a thousand years after 
Byarni’s sighting, the NEO-PI-R was published in America (Costa & McCrae, 
1985), and it has quickly spread throughout Europe. 

It is, of course, a mere conceit to suppose that Europe and America are really dif- 
ferent Worlds. Contemporary psychology draws as much on Freud, Pavlov, and Pia- 
get as it does on James, Allport, and Skinner. In particular, the multivariate trait psy- 
chology on which the NEO-PI-R 15 based was heavily influenced by the methods 
and findings of the British school, from Galton to Cattell (Costa & McCrae, 1976) 
and Eysenck (Costa & McCrae, 1986). The FFM was initially discovered in analyses 
of English language trait terms (Goldberg, 1993), but at least four of the factors have 
also been found in lexical analyses of a number of other European languages (De 
Raad, Perugini, Hřebíčková, & Szarota, 1998). There is thus every reason to expect 
that the NEO-PI-R should work well in Europe. 

Table 2 summarizes the present status of the instrument in Europe, including a 
few translations currently in progress. Authorized translations have been approved 
for research use on the basis of review of an independent back-translation; validated 
translations have been used in research and shown some degree of empirical support. 
Dutch, German, French, Polish, and other versions have been published (Borkenau 
& Ostendorf, 1993; Hoekstra, Ormel, & De Fruyt, 1996; Rolland, 1998; Zawadzki, 
Strelau, Szczepaniak, & Śliwińska, 1997), and a British adaptation of the English 
version is distributed by The Test Agency. The other completed translations are cur- 
rently available by license from the American publisher, Psychological Assessment 
Resources. 

Hambleton (1994) has discussed guidelines for adapting psychological tests 
across cultures. He notes that “the expertise and experience of translators is perhaps 
the most crucial aspect of the entire process of adapting instruments" (p. 235), and 
argues that translators should have familiarity not only with both languages, but also 
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Table 2. European translations of the NEO-FFI and NEO-PI-R by language family 


Language Version Translator? Status 
Altaic к 
Turkish NEO-FFI D. Sunar Validated 
NEO-PI-R S. Gülgoz Validated 
Indo-European 
Albanian 
Albanian NEO-FFI D. R. Carney In progress 
Germanic 
Danish NEO-PI-R H. Hansen Validated 
Dutch/Flemish NEO-PI-R H. Hoekstra Published 
German NEO-FFI P. Borkenau Published 
NEO-PI-R А. Angleitner Published 
Icelandic NEO-PI-R F. H. Jónsson Validated 
Norwegian NEO-PI-R Ø. Martinsen Validated 
NEO-PI-R L. Eriksen Validated 
Swedish МЕО-ЕР! B. Hagberg Authorized 
NEO-PI-R H. Bergmann Validated 
Greek 
Greek NEO-PI-R E. Besevegis In progress 
Romance 
French NEO-PI-R J. -P. Rolland Published 
Italian NEO-PI-R G. V. Caprara Validated 
Portuguese NEO-PI-R M. P. de Lima Published 
Romanian NEO-FFI S. Borza In progress 
Spanish NEO-FFI J. F. Salgado Validated 
NEO-PI-R M. Avia Published 
Slavic 
Bulgarian NEO-PI-R N. Alexandrova In progress 
Croatian NEO-PI-R 1. Marušić Validated 
Czech NEO-FFI M. Hrebicková Published 
Polish NEO-FFI J. Strelau Published 
NEO-PI-R J. Siuta Authorized 
Russian NEO-FFI M. Bodunov Authorized 
NEO-PI-R V. Oryol Validated 
Serbian NEO-PI-R G. Knezevic Validated 
Slovak NEO-PI-R |. Ruisel In progress 
Semitic 
Maltese N domain А. Borg Authorized 
Uralic 
Estonian NEO-PI-R J. Allik Validated 
Finnish NEO-PI L. Pulkkinen Validated 
Hungarian NEO-PI-R Z. Szirmák Validated 


*Many translations are the work of teams; only the corresponding author is listed here. 


with the constructs of interest and the process of test construction. In practice, this 
means that translation should be entrusted not simply to competent bilinguals, but to 
bilingual personality psychologists. All the European translations were made by per- 
sonality psychologists; usually teams of translators were involved. Studies using 
European translations were conducted by European psychologists, knowledgeable 
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about the test-taking experience and motivation of their subjects. Standard guide- 
lines for translating the NEO-PI-R are available from the authors. 

Hambleton also lists a series of guidelines for establishing the interpretability of 
scores. It should be noted that use of a personality measure within a culture should 
be based on considerations of construct validity; whereas use of a personality meas- 
ure across cultures requires, in addition, demonstration of scalar equivalence — that 
is, evidence that the same raw scores indicate the same levels of the trait in the diffe- 
rent cultures (Van de Vijver & Leung, 1997). The amount of within-culture validity 
evidence varies across European translations, but in general, experience in Europe 
suggests that careful translations of the NEO-PI-R by qualified psychologists yield 
versions that consistently demonstrate construct validity. There is much less direct 
evidence to date on the scalar equivalence of these versions to the American NEO- 
PI-R, but exploratory analyses again suggest that good translations of the NEO-PI-R 
are likely to produce at least rough equivalence (McCrae, 2001). 

The entries in Table 2 have been organized by language family, and it is clear that 
most European languages are represented. In terms of numbers of speakers, the ma- 
jor omissions are Belorussian and Ukrainian, and most citizens of those countries 
are sufficiently fluent in Russian to use that version (personal communication, Z. 
Simakhodskaya, April 24, 1999). Lettish and Gaelic translations would complete the 
roster of branches of Indo-European. A Basque translation would add another langu- 
age family; unfortunately, being an extinct language, an Etruscan version is proba- 
bly out of reach. 

Without denying the value of indigenous constructs and measures, there is ob- 
viously much to recommend the use of the same instrument in many different cultu- 
res. Findings in one culture can suggest hypotheses for research in others; cultures 
themselves can be compared in terms of a common metric. The NEO-PI-R has fre- 
quently been chosen for this role because it claims to be comprehensive, and thus to 
provide a basis for systematic evaluations of personality. 


The Icelandic NEO-PI-R 


We illustrate the process of translating, validating, and interpreting the NEO-PI-R 
by considering the Icelandic version. Iceland is an island with a population of a 
quarter million. The original settlers — Norwegians and their Celtic slaves — arri- 
ved just over a thousand years ago and soon converted to Christianity. The Icelandic 
language, derived from Old Norse, has remained largely unchanged since that time; 
the Sagas of the 12th Century can still be read by modern Icelanders. Between 1262 
and 1904 Iceland was ruled by Norwegian and Danish kings, and it continues to 
show strong Scandinavian influences. In other respects, Icelanders resemble Ameri- 
cans — for example, in the high value they set on individualism (Jónsson & 
Olafsson, 1991). 

A few prior studies have examined personality variables in Icelandic groups. 
Hart, Hofmann, Edelstein, and Keller (1997) assessed personality in Icelandic child- 
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ren and found resilient, overcontrolled, and undercontrolled types resembling those 
found in American children. Bjoergvinsson and Thompson (1994) replicated the 
North American factor structure of the Basic Personality Inventory (Jackson, 1989) 
in a sample of Icelandic teenagers. Sigurdsson and Gudjonsson (1995) showed that 
drug-dependent prisoners scored lower on measures of socialization and higher on 
measures of Neuroticism and Psychoticism than other Icelandic prisoners. These 
findings suggest the hypothesis that personality variables function much the same in 
Icelandic and American samples. Experience with the Icelandic NEO-PI-R consti- 
tutes a further test of that hypothesis. 

The translation was made by FHJ and two students, following procedures sugge- 
sted by the test authors. During the translation, all items in each facet were grouped 
together to emphasize the construct of interest instead of the literal wording of each 
item. А back-translation into English was made by a State Authorized Court Inter- 
preter unfamiliar with the NEO-PI-R. RRM reviewed the back-translation and iden- 
tified 19 questionable items — 7.9 per cent of the total. For example, the English 
Item 20, “I am easy-going and lackadaisical,” was initially back-translated as "I am 
calm and carefree," suggesting low N rather than the intended low C. Problems in 
the back-translation itself appeared to account for four of the questionable items; the 
remainder were reworded in Icelandic and back-translated again — Item 20 now 
appeared as “1 am ambitionless and indifferent," clearly low C — and this final ver- 
sion was reviewed and authorized. 

Similar procedures have been followed in all authorized translations, and most 
back-translations have been judged about 90 per cent accurate on the first review. 
This speaks well of the skill of the translators and the care they took in crafting their 
translations, but it also suggests that personality constructs themselves are rather 
easily conveyed in many different languages. 

In a series of studies, the Icelandic NEO-PI-R was administered to 337 men and 
women, aged 18 to 67. Most were psychology or social science students or their 
parents or friends. All volunteered, and all were tested individually, either in the 
laboratory or at home. Coefficient alphas for the five domains ranged from .83 to 
-91; for the 30 8-item facet scales, they ranged from .48 for O6: Openness to Values 
to .81 for N3: Depression, with a median of .68. This value is only slightly lower 
than the American median value of .71, and comparable to values found in German, 
Italian, and Croatian versions (McCrae et al., 1999). High internal consistency, 
which often means only item redundancy (Cooper, 1998), has never been a priority 
in NEO-PI-R scale development. 

When five factors are extracted from intercorrelations of the 30 facet scales in 
this Icelandic sample, varimax rotation yields a clear replication of the American 
factor structure, with factor congruence coefficients all .96 or .97. Table 3 reports a 
very slightly better fit from a targeted rotation (McCrae, Zonderman, Costa, Bond, 
& Paunonen, 1996). All facets load at least .40 on the intended factor, and all except 
E3: Assertiveness have their highest loading there. The similarity of secondary loa- 
dings can be evaluated by variable congruence coefficients, all of which exceed 
chance levels. At least by internal criteria, the individual facet scales show conver- 
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Table 3. Factor structure of the Icelandic NEO-PI-R 


Procrustes Rotated Principal Component Variable 
NEO-PI-R Facet N E о А (6 Congruence 
Neuroticism 
N1: Anxiety .84 -.15 -.03 -.01 -.05 .98** 
N2: Angry Hostility .63 .05 -.01 -.49 .01 "99:5 
N3: Depression .80 -.27 .00 .02 -.18 .98** 
N4: Self-Consciousness 274 -.21 -.17 .24 -.11 96555 
N5: Impulsiveness .42 .34 2102 “27 -.35 |^ .99* 
N6: Vulnerability .74 -.08 -.15 .04 -.32 .99** 
Extraversion 
E1:Warmth -.14 „73 ‚27 .32 .16 .99** 
E2: Gregariousness -.14 „77. -.07 .10 .01 Ова 
ЕЗ: Assertiveness -.38 ‚41 27 -.45 .22 Sp 
E4: Activity -.10 .55 .11 -.20 .31 .97** 
E5: Excitement Seeking .02 55 ‚00 -.18 -.13 Bi 
E6: Positive Emotions -.14 .74 .16 .10 .14 29915 
Openness to Experience 
O1: Fantasy .16 22/1 .60 -.10 -.21 .99** 
O2: Aesthetics .23 213 .65 20177 .10 .98** 
O3: Feelings .28 .44 .56 .08 .19 .98** 
O4: Actions -.30 2E .64 -.01 -.05 .99** 
O5: Ideas -.09 -.02 272. -.06 .13 29918 
O6: Values -.23 .07 ‚65 „13 -.04 .92* 
Agreeableness 
A1: Trust -.34 .44 215) .46 .05 05° 
A2: Straightforwardness .00 -.08 -.19 .68 .09 19785 
АЗ: Altruism -.03 .39 .08 :55 ‚24 .98** 
A4: Compliance -.16 -.09 -.06 .74 -.10 29028 
А5: Modesty .35 -.12 -.15 .62 .02 .97** 
A6: Tender-Mindedness .09 .09 233 .56 -.12 .90* 
Conscientiousness 
C1: Competence -.41 .24 27 -.12 .56 .96** 
C2: Order .08 -.01 Sale) -.05 72 9755 
C3: Dutifulness -.08 .03 -.09 127, .69 :9755 
C4: Achievement Striving -.06 .21 .06 -.04 .79 .99** 
C5: Self-Discipline -.24 .05 .01 .09 .85 .97** 
C6: Deliberation -.08 -.31 -.14 .26 .52 .96** 
Factor Congruence .98** Ол , 96** .97  .98'* .97** 


Note: N = 337. Loadings over .40 in absolute magnitude are given in boldface. *Congruence higher 
than 95% of rotations from random data. **Congruence higher than 99% of rotations from random 
data. 


gent and discriminant validity, and the structure of personality traits in Iceland 
seems to be the same as in America. 

Is this a function of the similar value systems or Germanic languages of these two 
cultures? Probably not. Table 4 summarizes results of European tests of factor repli- 
cability; high congruences are found in every culture so far examined. While this is 
good news for personality psychologists, it is no longer really news, nor is it a uni- 
que property of the NEO-PI-R. The universality of the FFM has been demonstrated 
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Table 4. Factor congruence coefficients comparing NEO-PI-R translations to the American normative 
structure 


Е Procrustes-rotated principal component 
Sample (N) N E о А C 
Dutch (1,305) .98 ‚97 ‚94 x .98 
German (1,324) .97 .96 .97 .98 197. 
Norwegian (379) .96 9607 .94 .96 „97 
French (447) .97 .96 .94 .96 197. 
Italian (697) .97 .95 .96 .97 .97 
Portuguese (2,000) .98 .94 .89 .97 .97 
Croatian (719) .97 .96 .95 .97 :97 
Russian (178) .96 „95 „95 .94 .96 
Serbian (422) .96 .97 295 .97 „97 
Estonian (711) .96 :97 .96 .97 192. 


Note: Data from Hoekstra et al., 1996; Kallasmaa, АШК, Realo, & McCrae, 2000; Knezevic, Radovic, 
& Opačić, 1997; Martin et al., 1997; McCrae & Costa, 1997; McCrae, Costa et al., 1996; Roliand, 
Parker, & Stumpf, 1998; personal communication, H. Nordvik, May 5, 1999. 


often, using different instruments (Paunonen ег al., 1996) as well as different cultu- 
res (McCrae & Costa, 1997). 

Given the demonstration of factorial validity in Icelandic, there is a strong 
temptation to proceed immediately to an interpretation of mean levels. Do these des- 
cendants of the Vikings show the high levels of irascibility, excitement seeking, and 
arrogance, and the low impulse control that we might attribute to their fierce an- 
cestors (Magnusson, 1960)? Figure 1 shows mean NEO-PI-R profiles plotted against 
American adult norms. Men and women show very similar patterns, suggesting that 
gender differences in the American norms are preserved in Iceland. Overall, these 
Icelanders appear to be high in М, E, and О, and low in A and С; examination of 
facet scales show that they are also high in N2: Angry Hostility and ES: Excitement 
Seeking, and low in A5: Modesty and C5: Self-Discipline. It does appear that mo- 
dern Icelanders have something of the temperament of Vikings. 

But that conclusion is many steps ahead of the supporting evidence. To begin 
with, there is a powerful alternative hypothesis, because the pattern of high N, E, 
and O, and low A and C is familiar to students of adult development. Table 5 sum- 
marizes cross-sectional age trends from adolescence through middle age in ten Eu- 
ropean nations; although relatively modest in magnitude, the effects are generally 
significant in these large samples, and the direction of the effects is uniform. 
Perhaps the Icelanders depicted in Figure | share the temperament, not of Vikings, 
but of adolescents and young adults'. In fact, the median age in this sample is 23. 
This example can serve as a reminder that any interpretation of national character 


'The Vikings known to history were probably rowdy in part because they were young; life 
expectancy was short in the Middle Ages. Those who lived longer, like the eponymous hero of Njal’s 
Saga, probably developed more maturity and self-restraint 
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should be based on nationally representative samples, which this clearly is not. Plau- 
sible as the age interpretation is, it is by no means certain that it is correct. Plotting 
Icelandic data on American profile sheets presumes that raw scores are strictly com- 
parable (Van de Vijver & Leung, 1997), and that has not yet been demonstrated. 
Perhaps the most direct way to test the equivalence of scale scores is through studies 
of bilinguals who complete two versions of a scale. Bilingual studies would proba- 
bly be easy in Iceland, where English is widely spoken, but they have not yet been 
conducted. In fact, the only bilingual equivalence study conducted in Europe that 
we have been able to identify compared Estonian and Russian versions of the NEO- 
PI-R. Konstabel (1999) showed high cross-language correlations (medians of .88 
and .81 for domains and facets, respectively), but significant (though small) mean 
level differences for two domains and several facets. Experience with bilingual stu- 
dies in non-European languages (McCrae, Yik, Trapnell, Bond, & Paulhus, 1998; 
Piedmont & Chae, 1997) suggests that this is likely to be a typical result: Different 
language versions of the NEO-PI-R yield very similar, but not identical, mean va- 
lues. Where cultural differences in mean levels are the focus of interest, bilingual 
studies would seem to be essential. 


NEO-PI-R research in Europe 


In addition to bilingual studies, interpretation of the Icelandic NEO-PI-R would be 
enhanced by studies of cross-observer agreement, correlations with other scales and 
inventories, and research on personality development. We can probably anticipate 


Table 5. Cross-Sectional Adult Age Trends in Five Domains 


Domain 
Sample N E О А С 
Turkish (511) Down Down Down Up Up 
British (540) Down Down n.s. n.s. Up 
Dutch/Flemish Down Down Down Up ns, 
German (3,442) Down Down Down Up Up 
Italian (690) П.5. Down Down Up Up 
Portuguese (1,880) Down Down Down Up Up 
Spanish (764) Down Down Down n.s. Up 
Croatian (702) П.5. Down Down Up Up 
Czech (912) Down Down Down Up Up 
Russian (297) Е n.s. Down Down Up Up 
Estonian (598) n.s. Down Down Up Up 
—$—$—— EE POW POM 


Note: From Costa et al., 2000; McCrae et al., 1999; McCrae et al., 2000, pers icati 
ii у ә А опа! 
F. De Fruyt, May 27, 1999. P communication, 
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the results of those studies, however, by examining research conducted elsewhere in 
Europe. 

One of the unusual features of the NEO-PI-R is the provision of parallel self- 
report and observer rating forms. Agreement across methods of measurement con- 
stitutes one of the most powerful forms of evidence for the validity of personality 
scales, and American studies have shown substantial self/spouse, self/peer, and 
peer/peer agreement (Costa & McCrae, 1992b). Using the 60-item NEO-FFI in a 
German sample, Riemann, Angleitner, and Strelau (1997) reported self/peer and 
Spearman-Brown corrected peer/peer correlations ranging from .49 to .65 for the 
five domains. Zawadzki and colleagues (1997) showed uncorrected peer/peer corre- 
lations of .26 to .53 in a Polish sample: they also showed self/peer correlations ran- 
ging from .36 to .66. Ongoing analyses in Russian (T. Martin, personal communica- 
tion, June 4, 1999) and Norwegian (personal communication, @. Martinsen, May 28, 
1999) samples show significant self/partner agreement for the domains and most of 
the facets of the full NEO-PI-R. 

Both self- and peer-reports have been used in behavior genetic studies. One of the 
first studies on the heritability of factors beyond N and E utilized Swedish translati- 
ons of brief O, A, and C scales (Bergeman er al., 1993). Zawadzki and colleagues 
(1997) used the NEO-FFI in a Polish sample of 546 pairs of twins, aged 17 to 64, to 
examine heritability of the five factors. АП five were significantly heritable when 
self-reports were examined (/r's = .30 to .57), and all but Agreeableness were herita- 
ble in peer ratings (178 = .52 to .71). Riemann and colleagues (1997) used German 
self-reports and peer ratings to estimate heritability of latent variables; increased 
precision of measurement from the combined assessments led to much higher esti- 
mates of heritability, ranging from .66 to .79. In an unusual cross-cultural behavior 
genetics study. Jang et al. (1998) showed that NEO-PI-R facets were equally herita- 
ble in German and Canadian twin samples. 

In both the U.S. and Europe, the NEO-PI-R has begun to be used extensively in 
industrial/organizational contexts. De Fruyt and Mervielde (1997) showed that 
NEO-PI-R scores predicted vocational interests in a Belgian sample. Furnham, 
Crump, and Whelan (1997) reported that NEO-PI scales correlated meaningfully 
with British assessors' ratings of managerial capacity. Salgado and Rumbo (1994) 
demonstrated that the Conscientiousness scale of the Castilian NEO-FFI predicted 
job aspiration and performance among financial service managers. Martinsen (per- 
sonal communication, May 28, 1999) showed personality profile differences associ- 
ated with occupational groups in Norway: Artists were high in Openness, nurses in 
Agreeableness. 

A number of articles have examined relations between NEO-PI-R scales and 
other personality measures. Some of these studies replicate associations previously 
reported in American samples — for example, with public and private self- 
consciousness (Realo & Allik, 1998) or the Eysenck scales (Avia et al., 1995). 
Others have examined convergence between the NEO-PI-R and indigenous measu- 
res of the five factors — for example, in Italian (Caprara, Barbaranelli, Borgogni, & 
Perugini, 1993) and Dutch (Hendriks, Hofstee, & De Raad, 1999). Perhaps most 
interesting, however, are those that relate the NEO-PI-R to constructs of European 
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origin. These studies speak to the comprehensiveness of the FFM and to the compa- 
rability of more or less independent conceptual systems. 

Strelau and Zawadzki (1995), for example, operationalized the Regulative Theory 
of Temperament in an instrument called the Formal Characteristics of Behavior — 
Temperament Inventory (FCB-TI). The Regulative Theory of Temperament is an 
elaboration of Pavlovian concepts, and the FCB-TI measures Briskness, Perseveran- 
ce, Sensory Sensitivity, Emotional Reactivity, Endurance, and Activity. Although 
Perseverance and Endurance sound as though they are forms of Conscientiousness, 
correlations with the Polish NEO-FFI show that they are related, in different directi- 
ons, chiefly to N. Perseverance, which might better be labeled Perseveration, refers 
to a rigid repetition of behavior and is associated with high N. Endurance refers to 
low arousability in the face of stimulation, and is related chiefly to low N. Openness, 
which is sometimes missing in lexical analyses (De Raad er al., 1998), is at least 
partially represented in the FCB-TI in the form of Sensory Sensitivity; Agreeable- 
ness, however, is unrelated to any of these temperamental variables. 


Future directions 


It should soon be possible to administer the NEO-PI-R, or at least the NEO-FFI, to 
any European. АП available data suggest that the instrument works reasonably well 
in translation, and indeed, that findings from one country are usually generalizable 
to others (cf. Salgado, 1997). Recent trends toward political and economic unity in 
Europe seem to be paralleled by the demonstration of psychological unity, at least at 
the level of enduring dispositions. 

On a practical level, the availability of a common instrument should facilitate 
both research and application. Use in VO psychology has already been mentioned; 
the FFM and the NEO-PI-R also have promising roles in clinical psychology and 
psychiatry (Anderson, Barnes, Patton, & Perkins, 1999; Matthews, Saklofske, Costa, 
Deary, & Zeidner, 1998; Petot, 1994), behavioral medicine (Lemos-Giráldez & Fi- 
dalgo-Aliste, 1997), and educational and political psychology (Blickle, 1996; 
Riemann, Grubich, Hempel, Mergl, & Richter, 1993; Schouwenburg & Kossowska, 
1999). 

At a more theoretical level, the NEO-PI-R should be especially valuable in cross- 
cultural psychology. In this context, it is well to recall that European languages ex- 
tend well beyond the boundaries of Europe. The Russian NEO-PI-R has been used 
to examine acculturation of immigrants in the United States (Simakhodskaya, 2000); 
the English version has been used to compare Black and White college students in 
South Africa (Heuchert, Parker, Stumpf. & Myburgh, 2000); research on the validity 
of the Portuguese version in Brazil is underway. The translations listed in Table 1 
offer useful tools for comparisons around the world. 

Two other research topics are of particular interest to us. First, although a good 
deal can be learned about personality development from cross-sectional comparisons 
of age differences across cultures (McCrae er al.. 1999), the stability of individual 
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differences can only be addressed by longitudinal studies. Only a handful of such 
studies have been conducted outside the U.S. (e.g., Thomae, 1976), and they did not 
use measures of the FFM. A body of parallel studies in several different cultures 
using the NEO-PI-R over, say, a ten-year interval would be enormously informative 
about the course of adult personality development. A Russian longitudinal study of 
self-reports and observer ratings is currently underway (Martin, Costa, Oryol, Ruka- 
vishnikov, & Senin, in press). 

Finally, there is keen interest today in studies of the molecular genetics of perso- 
пашу. Because genetic influences appear to be similar across cultures (Jang et al., 
1998), such studies might be done anywhere. But early findings have proven diffi- 
cult to replicate (Ball er al., 1997), probably because so many hundreds of genes 
influence personality that finding any one is a daunting task. New strategies are 
needed and are likely to work best in highly homogeneous populations where gene- 
tic noise is minimized. Sardinia provides one such opportunity, and research inclu- 
ding the Italian version of the NEO-PI-R is currently being planned. Another op- 
portunity, of course, 1s in Iceland, and as we have seen, the Icelandic NEO-PI-R is 
ready for use there. 
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Chapter 4 


The Five-Factor Personality Inventory: 
Assessing the Big Five by means of brief and 
concrete statements 


А. А. Jolijn Hendriks 
Willem K. B. Hofstee 
Boele De Raad 


Introduction 


The Five-Factor Personality Inventory (FFPI; Hendriks, 1997; Hendriks, Hofstee, & 
De Raad, 19993, 1999b; Hendriks, Hofstee, De Raad, & Angleitner, 1995) is a 
questionnaire for assessing a person's standing on the Big Five dimensions Extra- 
version, Agreeableness, Conscientiousness, Emotional Stability, and Autonomy. The 
FFPI consists of 100 brief and concrete behaviorally descriptive statements in the 
third person singular (e.g., takes charge, avoids company, takes others' interests into 
account). This item format can be used for other-ratings as well as self-ratings. In 
the latter case, it may stimulate the subject to take a more objective perspective. 
Ratings are made on a 5-point scale ranging from not at all applicable to entirely 
applicable. A person's position on each of the five dimensions is calculated by 
taking differentially weighted sums of all his or her 100 item responses (Hofstee, 
Ten Berge, & Hendriks, 1998). This scoring procedure according to rotated principal 
components maximizes internal consistency reliability and amount of variance 
explained (Ten Berge & Hofstee, 1999). 


History and rationale of the FFPI 


The FFPI has been developed within the psycholexical paradigm (e.g., De Raad, 
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2000; Digman, 1990; Goldberg, 1981, 1990; John, 1990; John, Angleitner, & 
Ostendorf, 1988). In the psycholexical paradigm it is assumed that all behavioral 
differences that people encounter in their daily interactions become “sedimented” in 
everyday language (Cattell, 1943, p. 483; Goldberg, 1981). In other words, eventu- 
ally, people will have words to talk about these differences. Dictionaries record the 
most commonly used language units. Research on personality traits discriminating 
people or groups of people thus starts from an unabridged dictionary. The first 
systematic scan for personality-descriptive terms has been carried out by Allport and 
Odbert (1936), who recognized 4,504 stable traits among a total list of 17,953 terms 
that can “...distinguish the behavior of one human being from that of another" (p. 
24). It was not until the early 1940's however that, gradually, insight was gained in 
the underlying structure of personality traits. After the pioneer studies of Cattell 
(1943, 1945, 1947) and Fiske (1949), subsequent studies like those of Tupes and 
Christal (1961), Norman (1963), and Goldberg (1981, 1982) provided the founda- 
tions of what became known as the “Big Five" (Goldberg, 1981) or "Five-Factor 
Model" (FFM; McCrae & Costa, 1997) . Progress was made more rapidly since 
computers became available enabling more sophisticated techniques of large-scale 
data analysis like principal components (factor) analysis. In the last two decades, 
trait taxonomic research has been carried out by many researchers in many different 
countries. The real break-through of the FFM came about when factor solutions 
proved to be stable across methods of data collection and analysis (Goldberg, 1990; 
Peabody & Goldberg, 1989) and the model appeared not to be confined to sets of 
trait-adjectives, but was found to hold also in the domain of personality question- 
naires (Digman, 1990; Digman & Inouye, 1986; Ostendorf & Angleitner, 1992) and 
temperament inventories (Angleitner & Ostendorf, 1994). 

At present, the FFM is the most widely used working hypothesis of personality 
structure (McCrae & Costa, 1997), notwithstanding the fact that the model has been 
seriously criticized regarding number and nature of the factors (e.g., Block, 1995; 
Eysenck, 1991; McAdams, 1992; Paunonen & Jackson, 2000). Among psycholexi- 
cal psychologists consensus has been reached on four of the five factors, generally 
labeled: Extraversion, Agreeableness, Conscientiousness, and Emotional Stability 
(or, conversely, Neuroticism). The fifth factor is still in dispute, even among psy- 
cholexical psychologists (De Raad & Van Heck, 1994). This least replicable factor 
has been labeled variously as Culture, (Tupes & Christal, 1961), Intellect (Goldberg, 
1992; Ostendorf, 1990), Openness to Experience (Costa & McCrae, 1992), and 
Creativity or Imagination (Saucier, 1992). Recently, Autonomy has become another 
serious candidate (Hendriks, 1997; Hendriks et al.. 1999b: Perugini & Ercolani, 
1998; Rodriguez-Fornells, Lorenzo-Seva, & Andrés-Pueyo, 2001). 


! One of our reviewers suggested to refrain from using the "Big Five" and "FFM" interchangeably. 
We feel however that there are no qualitative differences between the five-factorial model 
associated with studies based on the psycholexical approach to personality and the one operationa- 
lized in the NEO-PI-R, calling for distinctive names. As Goldberg (1993) already notes, there are 
many more similarities than dissimilarities. 
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In the last decade, an increasing number of instruments to measure the Big Five 
personality dimensions became available, as is illustrated by the contents of this 
book. The FFPI may be considered as a serious challenge to this set of instruments, 
for several reasons. Firstly, the FFPI has a direct link to the psycholexical approach 
to personality, by being based largely on the Abridged Big-Five Dimensional 
Circumplex model of personality traits (AB5C model; Hofstee & De Raad, 1991; 
Hofstee, De Raad, & Goldberg. 1992). Secondly, the FFPI is a broadly applicable 
instrument, because of its item format and wording. Thirdly, the FFPI is parsimo- 
nious in comparison with most of the alternative Big Five questionnaires. Finally, 
the FFPI has been developed interactively in the Dutch, American-English, and 
German languages, to enhance translatability and, therefore, its chances on cross- 
cultural applicability. 

At the time that we constructed the FFPI, the majority of the instruments devel- 
oped especially to measure the Big Five were lists of trait-descriptive adjectives (cf. 
Briggs, 1992). Examples are the Five Personality Factors Test (SPFT; Elshout & 
Akkerman, 1975), the Standard Personality Adjective Checklist (SPEL; Hofstee, 
Brokken, & Land, 1981), Goldberg's Big Five adjective markers (Goldberg, 1990, 
1992). and the Interpersonal Adjective Scale Revised for Big Five (IAS-R-B5; 
Trapnell & Wiggins, 1990). Those specifically defined Big Five inventories avail- 
able (e.g., the revised NEO-Personality Inventory [NEO-PI-R] of Costa & McCrae, 
1992; the Big Five Inventory [BFI] of John, Donahue, & Kentle, 1991) or under 
construction (the Big Five Questionnaire [BFQ] of Caprara, Barbaranelli, Borgogni, 
& Perugini, 1993) consisting of personality-descriptive phrases still showed many 
items containing trait-adjectives (ВЕО; NEO-PI-R) or, even, revolving around trait- 
adjectives (BFI). Trait-adjectives are inherently abstract, because they summarize 
behavior. As a consequence, they do not make up the best constituents of a ques- 
tionnaire to be administered to subjects with a wide range of educational levels. For 
the FFPI. we wrote items that convey trait meaning in statements as briefly, simply, 
and concretely as possible. To this purpose, guidelines (Hofstee, 1991) were taken 
into account, to ensure creating items that would be applicable to a wide variety of 
subjects and settings, and would elicit ratings as objectively as possible. Items that 
were selected to constitute the FFPI contain no negations and were all found to be 
comprehensible by subjects with a low level of education. 

Currently, most researchers adhere to a hierarchical conception of the FFM. Each 
of the five more abstract dimensions (e.g., Extraversion) is taken to subsume a 
number of mid-level personality traits or "facets" (e.g., assertiveness, activity). 
Instruments according to a hierarchical model are characterized by rational scale 
construction. Because they are not rooted in factor analysis (Block, 1995; Saucier & 
Ostendorf, 1999), the proposed number and nature of the facets may differ widely. 
The NEO-PI-R (Costa & McCrae, 1992), for instance, comprises 30 facets (240 
items), six facets per Big Five factor (labeled: Neuroticism, Extraversion, Openness 
to Experience, Agreeableness, and Conscientiousness). Alternatively, the BFQ 
(Caprara et al., 1993) has only two facets per Big Five factor (labeled: Emotional 
Stability, Energy. Openness, Friendliness, and Conscientiousness). Empirical scale 
construction would rather suggest three to four facets or "subcomponents" per Big 
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Five factor (Saucier & Ostendorf, 1999). In fact, a hierarchical model is at odds with 
the simplicity and parsimoniousness originally searched for by psycholexical 
psychologists. By conceiving the Big Five as a dimensional model (see below), a 
comparable amount of descriptive specification can be reached while staying within 
the five-dimensional trait space. With only a 100 items, the FFPI yields scores on 
the Big Five factors as well as facet scores. These facet scores specify a persons' 
position on distinct subclusters of traits of (pairwise combinations of) the Big Five. 
Differently from facet scores in a hierarchical model, however, FFPI facet scores do 
not account for additional variance over and above variance explained by the Big 
Five. Their rationale lies in the applied context only. 

Several authors point at the necessity to construct measures that encompass lower 
levels of specification than the Big Five, to enlarge the power of the FFM to predict 
important life criteria (e.g., Ashton, Jackson, Paunonen, Helmes, & Rothstein, 1995; 
Robertson & Callinan, 1998). It is much more efficient, however, to use the FFM 
primarily as an integrative structure in which other and more "dedicated" instru- 
ments (measuring, e.g., sensation seeking, workplace delinquency, or proactive 
personality) can be empirically positioned and compared than to try to encompass all 
lower levels of trait description that might help to enlarge the predictive validity of 
the FFM (cf. Hurtz & Donovan, 2000). Very likely, reliable and valid narrow 
measures will almost always outperform the broader personality dimensions in 
predictive power (e.g., Ashton, 1998; Crant & Bateman, 2000). 

Questionnaires are notoriously difficult to translate into other languages. When 
items need to be adapted for each language version, comparability of results among 
versions becomes questionable. As regards the FFPI, translatability of the items into 
at least two other languages (American-English and German) was taken as a pre- 
requisite for item selection, in addition to good psychometric properties. Our aim 
was to deliver a Big Five questionnaire that would be internationally applicable. 


Development of the FFPI 


Sources for item production 


Point of departure for item production was the ABSC model of personality traits 
(Hofstee & De Raad, 1991; Hofstee et al., 1992). The ABSC model integrates the 
classical Big Five simple-structure model and two-dimensional circumplex models 
(Wiggins, 1979). The ABSC model takes into account that most traits appear to be 
"blends" of two of the Big Five factors rather than factor-pure representatives: apart 
from a high (primary) loading on one factor, these traits have substantial (secondary) 
loadings on a second factor. In а simple-structure model traits are assigned to the 
one factor on which they load highest. In a circumplex-model, traits are ordered 
along the boundary of a circle according to their loadings on two orthogonal factors. 
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exuberant 
spontaneous 
open 


cheerful 
cheery 
genial 
merry 
cordial 


|+||- 


dominant sympathetic 

meddlesome good-natured 

obtrusive amiable 
friendly 


kind 


bossy ] tolerant 

domineering peaceful 

interfering peace-loving 
forgiving 


good 


egocentric 
selfish 

intolerant 
unmannerly 


silent 
austere 


negativistic 
gruff 
distrusful 
suspicious 
mistrustful 


uncommunicative 
introverted 

surly 
unsociable 
inscrutable 


Figure 1. The | x |! (Extraversion x Agreeableness) circumplex from Hendriks (1997) 


Pairwise combinations of the orthogonal Big Five factors yield ten (1/2n[n-1] = 10, 
with n = 5) such two-dimensional slices, or circumplexes, of the five-dimensional 
trait space. Each circumplex is divided in 12 circle segments (facets) of 30? each, 
each containing those traits that have their primary and secondary loadings (ap- 
proximately) in common (for further details, see Hofstee et al., 1992). In the 
Netherlands, 1,203 traits (Brokken, 1978) were assigned to the AB5C-facets accord- 
ing to their two highest loadings (De Raad, Hendriks, & Hofstee, 1992; Hofstee & 
De Raad, 1991). Figure 1 shows the I x II (Extraversion x Agreeableness) circum- 
plex as an example. 

The АВ5С model totals to 90 facets (n[n-1], with n = 10 factor poles), as opposite 
poles of the same factor cannot be paired by definition. All 65 well-filled facets were 
taken as a point of departure for writing items. The content of a facet is to be under- 
stood in a recursive way: by taking the shared meaning of the traits it contains, while 
contrasting it to its opposite cluster and centering it between its two neighboring 
clusters. Facet I+II+ (cheerful, cheery, genial, merry, cordial), for example, yielded 
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Table 1. Guidelines for item production 


Phrase items in the third person singular 

Phrase items in observable terms 

Avoid modifiers 

Avoid suggestive wording 

Avoid difficult words and expressions 

Avoid negations 

Avoid idiom 

Avoid racist, sexist, ethnocentric and androcentric expressions 
Avoid items mainly consisting of a personality-descriptive adjective, noun, or their combination 
Avoid specification 

Use proper Dutch 


Note: Hofstee (1991). 


the following items, among others: radiates joy, makes people feel welcome, gets 
along well with others. Based on the ABSC model, a total of 909 items were written. 

In addition, 136 items were written taking the list of 1,557 personality-descriptive 
verbs (De Raad, Mulder, Kloosterman, & Hofstee, 1988) as a point of departure (for 
details, see Hendriks, 1997). Examples are: insults people, cheers people up. Person- 
ality-descriptive verbs add meaning that is not found in trait-adjectives (De Raad, 
1992; De Raad et al., 1988). 

Finally, 266 additional items were written to cover Intellect. This strategy was 
chosen to gear the item pool to the American and German trait structures. For, the 
Dutch Factor V 1s Rebelliousness or Spirit, rather than Intellect (Hofstee & De Raad, 
1991). The initial Dutch item pool thus consisted of (909 + 136 + 266 =) 1,311 
sentence items. 


Procedure for item production 


Items were written in three consecutive teams of five to ten members, roughly in 
correspondence with the three sources for item production: the ABSC model, 
personality-descriptive verbs, and trait-adjectives referring to Intellect. Team 
members independently produced items, which were discussed in team sessions as 
to whether they fulfilled the guidelines for item production (Hofstee, 1991) and the 
meaning of the segments. The guidelines for item production address an item's 
outline and phrasing (Table 1). They served the purpose of creating items for an 
instrument that can be used for a broad range of educational levels, avoids discrimi- 
nation of certain people or groups of people, and, last but not the least, elicits ratings 
as objectively as possible. Teams as a whole decided whether items should be kept, 
adapted, or rejected for the initial Dutch item pool. 
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Establishing the final item pool 


The 1,311 sentence items were translated into American-English (in cooperation 
with Lewis R. Goldberg from the Oregon Research Institute) and German (by Alois 
Angleitner and his team from the University of Bielefeld). Only those items were 
retained for which good translations could be found in both languages; in the 
process, items were slightly adapted if necessary. Eventually, 914 items defined the 
final, trilingual, item pool. 


Item selection 


Item selection was carried out in two steps. In the first stage of the project, self- and 
other-ratings were collected on the 914 sentence items and 225 personality- 
descriptive adjectives representing the Dutch five-dimensional trait space. Subjects 
were 153 first-year students and 14 staff members of psychology. They rated 
themselves and were rated by two to four others who knew the target person well. 
All ratings were made on a S-point scale running from much less than others to 
much more than others. For the large majority (133) of subjects, we received a 
complete set of one self- and four other-ratings. Because the adjective-based Big 
Five factor structure in self-ratings and other-ratings appeared highly similar (cf. 
Ostendorf. 1990). self- and other-ratings were pooled, to increase precision and 
generalizability. The total sample consisted of 790 raters, about twice as many 
women as men, aged 15-80 years (M = 28 years, SD = 12.5 years). A principal 
components analysis (PCA) followed by varimax rotation was performed to estab- 
lish the dimensionality of the item pool and we examined the relationship between 
the sentence items and trait-adjectives (Big Five) structures. The scree plot clearly 
indicated that five factors should be retained for rotation. Correlations calculated 
between subjects’ scores on the five varimax-rotated sentence-based factors and the 
five varimax-rotated adjective-based factors amounted to .89, .88, .92, .85, and .88 
for Factor I to Factor V, respectively. The off-diagonal values were essentially zero, 
except for the one indicating the relationship between sentence-based Factor V and 
adjective-based Factor IV (0.14). Each sentence item was assigned to one of the 90 
ABSC facets according to its largest projection in Big Five space, i.e. its two highest 
correlations with the five adjective-based factors (e.g., see De Raad er al., 1992; 
Hofstee er al., 1992). In practice, it means postmultiplying the matrix of correlations 
(factor loadings) by a matrix containing the proper sines and cosines (available from 
the first author) and identifying the largest (positive or negative) values. Largest 
projections (ABSC facet loadings) appeared to range from -.66 to .71, with a median 
absolute value of .47. The sentence items covered the AB5C model quite well: only 
10 facets were found empty (see Hendriks, 1997, Table 7). For each sentence item, 
we established the correlation between self-ratings and the averaged (per target) 
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other-ratings. These self-(mean)peer validities ranged from -.08 to .66, with a mean 
and median value of .33. 

Based on their projections and self-(mean)peer validities, as well as data on com- 
prehensibility, observability, and social desirability of the items (for details, see 
Hendriks, 1997) as secondary criteria for item selection, a preselection of 284 
sentence items was made. We chose those items that showed the largest projections 
and highest validities, and, in addition, were found comprehensible, observable, and 
not too socially (un)desirable. Ratings on comprehensibility (1 = perfectly compre- 
hensible, 3 = totally incomprehensible) were provided by 45 students from a local 
school for lower professional training (low level of education). Ratings on observa- 
bility (1 = not or hardly observable to others, 5 = clearly observable to others) and 
social desirability (1 = very negatively, 2 = negatively, 3 = neutral, 4 = positively, 5 
= very positively) were provided by two samples of (90%) university students (№ = 
43 for observability and N — 48 for social desirability) from a wide variety of 
disciplines. 

The 284 sentence items included 125 items (25 per factor) for each of two paral- 
lel instruments, and another 34 items that were part of the Acquiescence scale (see 
below, correction for acquiescence). For both versions, items were selected such that 
secondary loadings were balanced as much as possible; for instance, if two IIL-IV 4 
items were selected, two III-IV- items were also selected. However, the spread of 
the sentence items across the АВ5С model did not allow to follow this procedure in 
full. Therefore, items were selected such that correlations to be expected between 
the five scales were roughly the same for both versions of the instruments. 

Additional data were collected on the 284 sentence items and 225 trait-adjectives. 
Subjects were 125 first-year students of psychology. With the exception of eight of 
them, each target gave a self-rating and was rated by two to four others who knew 
the target person well. All ratings were made on a 5-point scale running from лог at 
all applicable to totally applicable. The total sample consisted of 606 raters, almost 
twice as many women as men, aged 15-86 years (M = 28 years, SD = 13.5 years). 
This data set was combined with the опе (№ = 790) already available, resulting in a 
total of 1,311 raters after deletion of subjects with too many missing values (> 394) 
or suspect response profiles (following L. R. Goldberg, personal communication, 
November 23, 1994). We considered a subject's response profile suspect if this 
subject's item responses consisted of long series of identical (extreme) values, or if 
the difference between synonym and antonym correlation for this subject was less 
than .70. Following Goldberg (ibid.), we computed a subjects synonym correlation 
across 40 (quasi) synonym pairs (8 per factor; correlation between the items of a pair 
on average: .62), and his/her antonym correlation across 40 (quasi) antonym pairs (8 
per factor; correlation between the items of a pair on average: -.50), from the 914 
items making up the final item pool. Compared to Goldberg ("We then culled 
subjects if they provided ... a synonym correlation less than .40 or an antonym 
correlation larger than -.307), we used a less strict rule for culling: (synonym 
correlation minus antonym correlation) « (.40 - [-.30] = ) .70. We substituted the 
scale midpoint (“3”) for missing values in cases showing less than 3 per cent mis- 
sing item responses. 
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Prior to further analyses, a check was performed on the appropriateness of 
pooling the two data sets. In the first stage of the project, a comparative instruction 
had been used, while in the second stage of the project, an applicability instruction 
was used. It is quite conceivable that a rater finds certain behavior descriptions 
applicable ("4" under the applicability instruction) to a target person, while ће or 
she would hesitate to state that the same descriptions apply more (“4" under the 
comparative instruction) to this target person than to others. The intra-individual 
spreads across items may thus differ significantly depending on the type of instructi- 
ons given to raters. Furthermore, other-ratings tend to be more extreme than self- 
ratings (Hendriks, 1997, p. 15). Indeed, a two-way analysis of variance, with Rater 
and Instruction as the independent variables and subjects’ intra-individual spreads 
across the 284 sentence items as the dependent variable, showed significant main 
effects for Rater (F[1,1307] = 13.37, p « .001) and Instruction (F[1,1307] = 461.27, 
p « .001); no interaction effect was found. Other-ratings yielded more variance than 
self-ratings and the applicability instruction yielded more variance than the compa- 
rative instruction. Thus, prior to further analyses, we corrected subjects’ raw scores 
by dividing them bv the standard deviation of the condition (self- or other-rating, 
comparative or applicability instruction). 

A PCA followed by varimax rotation performed on the 284 sentence items yield- 
ed five factors, according to the scree test, indicating that the five-factor structure 
was recovered in this selection of 284 items from the pool of sentence items. Con- 
gruence coefficients phi between the sentence-based factors and the adjective-based 
Big Five structure amounted .91, .91, .92, .89, and .70 between varimax-rotated 
factors, and .93, .92, .91, .94, and .88 after procrustes rotation to optimal agreement, 
for Factors I to V, respectively. We further examined the relationship between the 
Dutch sentence-based structure and the one established in a large American sample 
(М = 766), kindly made available by L. К. Goldberg. These two data sets had 260 
sentence items in common. Congruence coefficients for varimax-rotated factors 
(based on 260 items) amounted to .94, .89, .87, .89, and .91 for Factors I to V, 
respectively. Additional measures for quantifying fit between matrices of factor 
loadings (McCrae, Zonderman, Costa, Bond, & Paunonen, 1996) supported our 
conclusion that the Dutch and American factor structure of the sentence items were 
remarkably similar. The overall congruence appeared .90 after varimax rotation and 
„91 after procrustes rotation to optimal agreement. At the level of items phi coeffi- 
cients ranged from .38 to .99, with a median value of .94; only 16 per cent of the 
sentence items had values below .85, which value is taken to indicate factor congru- 
ence (Haven & Ten Berge, 1977). 

Given the similarity between the Dutch and American structures, we decided to 
take the ABSC facet positions of the items in the American structure into account in 
the final item selection. To establish the final position of the axes, a two-sided 
procrustes rotation to optimal agreement was performed; the consensus matrix of the 
Dutch and American structures was varimax-rotated once more (Kiers, 1997). Each 
sentence item was assigned to one of the 90 АВ5С facets according to its largest 


projection. 
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Table 2. Mean values of the 100 FFPI items on the criteria for item selection 


————— —— м eo ШЕ" "Em 
В ABSC-facet projection:? 
| Extraversion 20 0.64 (0.05) 0.52-0.70 0.65 
| Agreeableness 20 0.62 (0.07) 0.46-0.74 0.63 
111 Conscientiousness 20 0.61 (0.06) 0.52-0.71 0.62 
IV — Emotional Stability 20 0.64 -. (0.06) 0.53-0.71 0.64 
V Autonomy 20 0.57 (0.06) 0.47-0.69 0.58 
Self-(mean)peer validity: 
| Extraversion 20 0.51 (0.06) 0.41-0.65 0.50 
l|  Agreeableness * 20 0.40 (0.09) 0.27-0.57 0.41 
Ш Conscientiousness 20 0.47 (0.09) 0.32-0.62 0.46 
IV Emotional Stability 20 0.44 (0.09) 0.29-0.61 0.45 
V Autonomy 20 0.37 (0.09) 0.22-0.55 0.38 
Comprehensibility:? 
| Extraversion 20 1.01 (0.02) 1.00-1.04 1.00 
| Agreeableness 20 1.01 (0.01) 1.00-1.04 1.02 
I Conscientiousness 20 1.01 (0.02) 1.00-1.04 1.00 
IV — Emotional Stability 20 1.02 (0.02) 1.00-1.07 1.01 
V Autonomy 20 1.01 (0.02) 1.00-1.07 1.02 
Observability:* 
| Extraversion 20 3.76 (0.47) 2.81-4.33 3.70 
|| Agreeableness | 20 3.82 (0.39) 3.00-4.51 3.79 
||| Conscientiousness 20 3.72 (0.43) 2.74-4.53 3.68 
IV _ Emotional Stability 20 117 (0.48) 2.63-4.42 3.34 
M Autonomy 20 3.61 (0.31) 3.21-4.21 3:52 
Social Desirability:° 

| Extraversion 20 3.17 (0.85) 2.04-4.27 3.05 
li Agreeableness 20 3.04 (1.02) 1.83-4.27 3.02 
Ш Conscientiousness 20 3.10 (0.72) 1.88-4.40 3.21 
IV Emotional Stability 20 3.02 (0.85) 1.85-4.23 2.67 
V Autonomy 20 3.06 (0.85) 1.90-4.08 3.25 


Note: From Hendriks (1997). ?Absolute values. ^1 = Perfectly comprehensible, 3 = Totally 
incomprehensible. ‘1 = Not or hardly observable to others, 5 = Clearly observable to others. 
44 = Very negative, 5 = Very positive. 


Selecting items for two parallel versions of the instrument appeared hardly fea- 
sible. Instead of two less optimal versions, we decided to construct one version 
containing the best items in terms of the primary and secondary criteria. Several 
item sampling plans were considered (Hendriks. 1997; Hendriks ег al., 1999b). 
Eventually, we decided to select 20 good items per factor (having their primary 
loading on that factor), to be well spread across the different facets of a factor in 
order to avoid redundancy. The following criteria for item selection were applied: 
ABSC facet loading (projection) , comprehensibility (< 1.07, on a scale from 1 
perfectly comprehensible to 3 totally incomprehensible), self-(mean)peer validity, 
observability, and nonextreme social desirability. Observability апа nonextreme 
social desirability served as marginal criteria. Variety of item content was a requi- 
rement that led to the rejection of otherwise good items. Table 2 gives an overview 
of the values of the 100 selected items on the different criteria. Table 3 gives an 
overview of the spread of the 100 FFPI items across the ABSC model. 

FFPI scale scores obtained by unit weighing of the item responses are correlated 
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Table 3. Spread of the 100 FFPI items across the AB5C Model 


Note: From Hendriks (1997). Column headings denote the primary loading of a factor, row headings 
denote the secondary loading of a factor. АВ5С facet 1+11+, for instance, is represented by 3 items. 


up to .50, due to shared secondary loadings of the items. Therefore, orthogonalized 
scale scores (factor scores) are the preferred units of analysis. A stand-alone (Pascal) 
scoring program that calculates a person's uncorrelated factor scores from his or her 
100 item responses can be obtained from the first author. The (five) factors are 
labeled: Extraversion, Agreeableness, Conscientiousness, Emotional Stability and 
Autonomy. In spite of the purposeful overrepresentation of Intellect-items in the 
pool of 914 sentence items and the list of 225 trait-adjectives, neither the trait- 
adjective based nor the sentence-based (five-)factor solutions showed Intellect to be 
the core meaning of the fifth factor. The core meaning of the five factors and 
examples of items are given in Table 4. 


Table 4. Bipolar core meaning of the FFPI factors and examples of items (in italics) 


EXTRAVERSION 
Talkative Silent 
Loves to chat Uncommunicative 


Laughs aloud 
AGREEABLENESS 

Mild 

Tolerant 

Is willing to make compromises 
CONSCIENTIOUSNESS 

Systematic 

Precise 

Does things according to a plan 
EMOTIONAL STABILITY 

Can take his/her mind off his/her problems 

Is always in the same mood 


AUTONOMY 
Can easily link facts together 
Wants to form his/her own opinions 
Analyses problems 


Avoids contacts with others 


Bossy 
Egocentric 
Imposes his/her will on others 


Undisciplined 


Sloppy 
Acts without planning 


Gets overwhelmed by emotions 
Is easily moved to tears 
Is easily hurt 


Follows the crowd 
Copies others 
Agrees to anything 
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The concept of (personal) autonomy stems from political philosophy and has to 
do with making one's own choices (Metaal, 1992). The underlying idea of the 
concept of autonomy is self-control and independence (Haworth, 1986). An auto- 
nomous person "subjects the norms with which he or she is confronted to critical 
evaluation and then proceeds to reach practical decisions by way of independent and 
rational reflection" (Young, 1986, p. 10). Riesman, Denney, and Glazer (1950, p. 
301) define autonomous people as questioners. According to these authors, "The 
‘autonomous’ are those who on the whole are capable of conforming to the behavio- 
ral norms of their society ... but who are free to choose whether to conform or not" 
(p. 287). Within psychology, autonomy is a less well-defined concept (Metaal, 
1992): it is referred to among others as a developmental endpoint (Erikson, 1968, 
pp. 107-114), a basic need (Murray, 1938, p. 82), and a causality orientation, 
referring to the perceived source of initiation and regulation of behavior (Deci & 
Ryan, 1985). One way or another, autonomy is considered an important concept in 
almost all fields of psychology. These fields include clinical and developmental 
psychology (e.g., Bekker, 1993; Clark, Steer, Beck, & Ross, 1995; Mills, 1994; 
Ryff, 1989), educational psychology (e.g., Cronbach, 1977, pp. 51 ff.; Wong, 2000), 
health psychology (e.g., Knee & Zuckerman, 1998), organizational psychology (e.g., 
Breaugh & Becker, 1987), personality psychology (e.g., Deci & Ryan, 1987; 
Paunonen, Jackson, & Keinonen, 1990), and social psychology (e.g., Deci & Ryan, 
1987). Several instruments contain an Autonomy scale: the Edwards Personal 
Preference Schedule (EPPS; Edwards, 1954), the Personality Research Form (PRF; 
Jackson, 1984), the Interpersonal Dependency Inventory (Hirschfeld et al., 1977), 
the Adjective Checklist (ACL; Gough & Heilbrun, 1983), the General Causality 
Orientations Scale (Deci & Ryan, 1985), the Nonverbal Personality Questionnaire 
(NPQ; Paunonen er al., 1990), and the Autonomy Scale (Bekker, 1993). As compa- 
red to these other instruments, FFPI-Autonomy may come closest to the dominant 
conception of autonomy in political philosophy (Metaal, 1992) in which critical 
reflection is the core meaning. 


Correction for acquiescence 


Acquiescence variance in trait ratings may seriously disturb the factor structure of 
personality traits and should therefore be removed (Hofstee er al., 1998). Acquies- 
cent responding is a tendency to an "excentric" scale usage, which manifests itself 
through a deviation from the scale's midpoint (e.g., "3" on a 5-point scale) of a 
person's mean score on a sizeable number of pairs of items that are opposite in 
meaning (e.g., warm vs. cold, friendly vs. unfriendly). Acquiescence variance can be 
reliably established if a questionnaire contains enough (e.g., 25-30) such pairs, 
spread across this questionnaire's domains of content (Hendriks, 1997). Hendriks et 
al. (1999b) found that in normal samples the Acquiescence factor may account for 
well over eight per cent of the variance in trait ratings. At all stages of the project of 
the construction of the FFPI, PCAs were performed on the "acquiescence-corrected" 
item responses. Correction took place by subtracting a person's mean score on an 
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Acquiescence scale from all of this person's raw item responses. Recently, more 
sophisticated techniques for correction, such as partialling, became available (Ten 


Berge, 1999). The Pascal scoring program mentioned above yields factor scores that 
are free of acquiescence variance. 


Psychometric properties of the FFPI 


Reliability 


Internal consistency 


The internal consistency reliability (Cronbach's a) of the FFPI factors proved to be 
good (Table 5). At the time the FFPI was constructed, Hendriks (1997) found a- 
reliabilities of .83 for Autonomy and .89 for the other four factors. In a variety of 
Dutch and one Flemish normal population samples (N = 104 to N = 2,494), mean as 
ranged from .81 for Autonomy to .86 for Extraversion and Conscientiousness 
(Hendriks et al., 1999b). Comparable values were found in normal population 
samples (М = 97 to М = 678) of 11 other cultures (Hendriks, 


Table 5. Reliability of the РЕР! factors 


FFPI 
Study Sample Rating N 1 li Iu у V 
Internal consistency (a) 
Hendriks, 1997 Normal  Self/other* 1,311 .89  .89 .89 .89 .83 
Hendriks, 2000 Patient Self 250 du vh YA ENDO cU 
Hendriks et al., 1999b^* Normal  self/other 104-2,494 .86 .84 .86 .85 .81 
Hendriks, Perugini et al., 2001^* Normal Self 97-678 .87 .84 .84 .84 .79 
Hendriks et al., 2001 Patient Self 105 .86 .80 .82 .85 YS 
Perugini & Ercolani, 1998 Normal Self 137 .89 .85 .83 .89 .83 
Perugini & Ercolani, 1998 Normal Other 226 .90 .86 .85 .91 .84 
Rodriguez et al., 2001 Normal Self 567 .84 .84 .84 .82 .78 
On average: .87 .84 .84 .86 .80 
Test-retest 
Hendriks, 1997 Normal  Self/other? 178 .79 .79 .83 .82 .79 
Hendriks et al., 1999b Normal Self 1,768 .79 .74 .77 .75 .76 
On average: ПЭО 7722:805 79 T 
fe DAVE ASS аа S шу = чыл 214 мы с 
Note: | = Extraversion, || = Agreeableness, || = Conscientiousness, IV = Emotional Stability, V = 


Autonomy. *Combined. Averaged values. ‘After procrustes rotation to the Dutch target structure. 
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Perugini et al., 2001). In a sample of 237 Dutch (non-psychiatric) hospital patients, 
а. ranged from .77 for Autonomy to .85 for Extraversion (Hendriks, 2000). In a 
sample of 105 German inpatients and outpatients with psychopathology, a ranged 
from .78 for Autonomy to .86 for Extraversion (Hendriks, Ostendorf, & 
Dieckmann, 2001). Others (Costa, Yang, & McCrae, 1998; Van Kampen, 2000; Van 
der Zee, Buunk, Sanderman, Botke, & Van den Bergh, 1999) who used the FFPI 
factor scores in their studies report only the internal consistency reliabilities of the 
scale scores, which are generally good. 


Test-retest 


The FFPI showed good levels of stability (Table 5). Across a six-months time span, 
correlations (М = 178) between factor scores were found of .79, .79, .83, .82, and .79 
for Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and 
Autonomy, respectively (Hendriks, 1997). Across a one-year time span, we found 
test-retest correlations (№ = 1,768) of .79, .74, .77, .75, and .76 for Extraversion, 
Agreeableness, Conscientiousness, Emotional Stability, and Auton-omy, respecti- 
vely (Hendriks et al., 1999b). The latter are based on only 50 items (10 per factor), 
thus underestimating the true values. 


Validity 


Convergent validity 


Overall, FFPI Extraversion, Agreeableness, Conscientiousness, and Emotional 
Stability showed good levels of convergent validity with their respective coun- 
terparts in other Big-Five measures such as the NEO-PI-R (Costa & McCrae, 1992; 
Dutch translation: Hoekstra, Ormel, & De Fruyt. 1996), the Berkeley Personality 
Profile (BPP; Harary & Donahue, 1994), the Big Five Inventory (ВЕІ; John er al., 
1991), Goldberg's Big Five adjective markers (Goldberg, 1992), and the Short 
Adjective Checklist to measure Big Five (SACBIF: Perugini & Leone, 1996). An 
overview is given in Table 6. Across studies, convergent correlations average to .74 
for Extraversion, .60 for Agreeableness, .73 for Conscientiousness, .71 for Emotio- 
nal Stability. In Dutch (N = 125) and Flemish (N = 105) samples, convergent 
correlations with the NEO-PI-R were on average .79 (.80 and .77, respectively) for 
Extraversion, .65 (.69, .61) for Agreeableness, .78 (.75, .81) for Conscientiousness, 
and -.75 (-.83, -.67) for Emotional Stability (Hendriks er al., 1999b; see also Hen- 
driks, 1997). Costa et al. (1998) report convergent validities between NEO-PI-R and 
FFPI in an American sample that are somewhat lower; in their study, however, the 
mean interval between administrations of the two instruments was 3.1 years, thus 
correlations partly indicate stability values. Substantial correlations for Extraversion, 
Agreeableness, Conscientiousness, and Emotional Stability were also found between 
FFPI and BPP (Hendriks ег al., 1999b), BFI (Rodríguez-Fornells et al., 2001), 


„ The Five Factor Personality Inventory 93 


Golberg's markers and SACBIF (Perugini & Ercolani, 1998), and the 4-Dimensional 
Personality Test (Van Kampen, 1996), a Big Five related instrument. Autonomy is 
clearly less convergent with its ЕЕМ counterpart factors: correlations with Intellect 
(Perugini & Ercolani, 1998) or Openness to Experience (Costa et al., 1998; Hen- 
driks, 1997; Hendriks et al., 1999b: Perugini & Ercolani, 1998; Rodríguez-Fornells 
et al., 2001) appear to be .40 at best. | 

| Autonomy seems to be а broader construct than Intellect and Openness to Expe- 
rience, also encompassing "leadership": apart from being related to Need for Cogni- 
tion — alike Intellect and Openness — and Capacity for Managing New Situations — 
alike Openness — Autonomy showed unique additional relationships with Self 
Awareness, (non-)Sensitivity to Others, and Generalized Self-Efficacy (Perugini & 
Ercolani, 1998). Rodríguez-Fornells er al. (2001) found Autonomy to be related to 
non-impulsive risk taking. 


Criterion validity 


Up to now, self-peer agreement figures are the sole indices of criterion validity 
available (Table 6). We compared self-ratings to the mean of two to four other- 
ratings and found correlations (N = 260) between factor scores of .73 for Extraversi- 
on, .70 for Agreeableness, .70 for Conscientiousness, .68 for Emotion-al Stability, 
and .54 for Autonomy (Hendriks, 1997; Hendriks et al., 1999b). Perugini and 
Ercolani (1998) compared self-ratings to (a) one other-rating and (b) the mean of 
five other-ratings. In the former case, correlations (М = 112) between factor scores 
ranged from .33 for Autonomy to .54 for Emotional Stability. In the latter case, 
correlations (М = 23) ranged from .45 for Autonomy to .77 for Conscientiousness. 


Predictive validity 


There is a growing body of evidence that personality pathology can be seen as the 
extremes of normal variation (e.g., Bagby et al., 1999; Costa & McCrae, 1990; 
Deary, Peter, Austin, & Gibson, 1998; Miller, Lynam, Widiger, & Leukefeld, 2001; 
Soldz, Budman, Demby, & Merry, 1993; Trull, 1992). In Big Five space, Emotional 
Stability and Extraversion are the two most important factors for explaining variance 
in personality disorders. However, instruments (questionnaire or structured inter- 
view), samples (normal or clinical), and type of rating (e.g., self- or clinician's) play 
a role in precisely which relationships are found between the Big Five and persona- 
lity disorders (Soldz er al., 1993). In self-ratings, all FFPI factors appeared signifi- 
cantly and meaningfully related to personality disorders, as measured by the VKP 
(Duijsens, 1996), a self-report questionnaire; predominantly, however, relationships 
were found for Emotional Stability, Extraversion and Conscientiousness (Hendriks 
et al., 1999b). These relationships are summarized in Table 6 by the ones for the 
DSM-IV and ICD-10 total scores (total number of criteria met) and are briefly 
discussed below. For each factor, results are presented in the order from strongest to 
weakest relationships. 
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Emotional Stability showed significant negative correlations (1.171 € r€|.49|, 
mean r = - 36) with the anxious, depressive, borderline, dependent, impulsive, 
avoidant, paranoid, histrionic, and anankastic (obsessive-compulsive) disorders. 
These disorders share mainly the feature of overactions to internal and external 
stimuli. Extraversion showed significant negative correlations (].17| < r € |.39|, 
mean r = -.27) with the avoidant, schizoid, anxious, depressive, schizotypal, passive 
aggressive, dependent, obsessive-compulsive, and paranoid disorders. These disor- 
ders have mainly social withdrawal in common. Conscientiousness showed signifi- 
cant negative correlations (|.17| € r&€[.36, mean r = -.24) with the schizoty- 
pal,antisocial (dissocial), histrionic, borderline, passive aggressive, narcissistic, and 
impulsive disorders. These disorders share features of unreliability and lack of self- 
discipline. Autonomy showed significant correlations with the avoidant (r = -.36), 


anxious (г = -.24), and dependent (r = -.22) disorders. Finally, Agreeableness 
showed significant correlations with the narcissistic (r = -.25), dissocial (r = -.24), 
and histrionic (r = -.19) disorders. In FFPI other-ratings, generally the same, but 


weaker, relationships were found, albeit that Extraversion was related to a lesser 
number of disorders than in self-ratings. 

In comparison with other Big-Five measures, some of the FFPI factors seem to 
tap slightly different aspects of behavior. For instance, FFPI Disagreeableness refers 
primarily to seeing to one's own needs first and taking advantage of other people, 
rather than antagonistic features like vindictiveness and lack of trust that are part of 
NEO Disagreeableness. Thus, not surprisingly, we found Agreeableness to be 
(negatively) related to the narcissistic but not the paranoid and borderline disorders. 
FFPI Unconscientiousness includes irresponsibility and neglect, next to lacking 
orderliness, planfulness, and discipline. This may explain why we found Conscien- 
tiousness weakly negatively related to disorders that can be said to have some 
maladjustment in common: the schizotypal, antisocial, borderline, histrionic, and 
narcissistic disorders. However, finding these relationships with Conscientiousness 
is not uncommon. 

Adema, Van der Zee, and Van der Molen (2000) examined the relationship be- 
tween personality and learning styles in a sample of 193 second-year students of 
psychology. They found significant zero-order correlations for all five FFPI factors. 
In accordance with expectations, particularly Conscientiousness and Autonomy 
appeared significant predictors, Autonomy primarily to the meaning directed 
learning style (e.g., “I check whether the conclusions of the authors flow logically 
from the facts on which they were based"), Conscientiousness to the reproduction 
directed (e.g., “I line up the most important facts and learn these by heart") and 
undirected learning styles (e.g., "I have little faith in my own study-capabilities"), 
see Table 6. These are meaningful relationships: whereas Conscientiousness refers 
primarily to working hard and systematically, Autonomy encompasses analytical 
engagement (Hendriks, 1997). Unexpectedly, Emotional Stability contributed at the 
level of subcomponents (not reported in Table 6): emotionally unstable students 
tended to use an undirected regulation strategy, being insecure about their own 
directions but unable to benefit from others' directions. Conceivably, strong feelings 
of anxiety inhibit proper thinking and listening to others. Also unexpected, 
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Agreeableness appeared associated with the application directed learning style (e.g., 
“If I have the possibility to choose, I especially take courses that seem useful for my 
future vocation"). This relationship may be sample specific: most psychology 
students are oriented to study subjects that are of practical use and score high on 
Agreeableness. 

De Fruyt (1997) examined the relationship between personality and adult crying 
in a sample of 105 second-year students of psychology and their relatives. Except 
for Autonomy (no counterpart), almost identical correlations were found with FFPI 
and NEO-PI-R factors. As regards the FFPI, Emotional Stability and Autonomy 
were found to be associated with weeping frequency and weeping as a coping style; 
Extraversion and Emotional Stability proved to be associated with experiencing 
positive effects of weeping (Table 6). Emotionally unstable subjects and subjects 
low on Autonomy weeped and perceived weeping as a coping style significantly 
more often than emotionally stable subjects and subjects high on Autonomy; rela- 
tionships were statistically significant even after partialling gender and age. Being 
overwhelmed by emotions is one of the characteristics of people scoring low on 
Emotional Stability. And, as far as crying is induced by situations one cannot 
handle, people high on Autonomy will be less likely to find themselves in such 
situations than people low on Autonomy. Emotionally stable and extraverted 
subjects experienced more positive effects and relief after crying than unstable and 
introverted subjects. No significant associations were found on experiencing nega- 
tive effects of weeping. It is quite conceivable that positive feelings vanish when 
crying becomes more of a habit (unstable subjects). Extraverts are in general more 
positive and more comfortable with expressing their feelings than are introverts. 

Nauta and Sanders (2000) examined the relationship between style of negotiation 
behavior and the Big Five personality factors as operationalized in the FFPI in a 
sample of 77 managers and employees in 11 manufacturing companies. A combina- 
tion of high extraversion and high agreeableness appeared significantly positively 
associated with problem solving behavior. A combination of high extraversion and 
low agreeableness appeared significantly positively associated with contending 
behavior (imposing one's preferred solution on the other party). However, unexpect- 
edly, the strongest associations were found for Emotional Stability and Autonomy 
(Table 6). Emotional Unstability (Neuroticism) inhibited contending behavior. 
Autonomy inhibited avoiding behavior (a [temporal] withdrawal from the conflict 
issue). Albeit unexpected, these are meaningful relationships, demonstrating the 
construct validity of the FFPI factors. Contending behavior, or competition, may be 
unattractive to individuals who are emotionally unstable, because such situations 
may upset them. Avoiding behavior in negotiations may contain an element of not 
grasping a conflict issue, or lacking the intellectual skills to approach an issue 
(Nauta & Sanders, 2000, p. 152). 

Van der Zee et al. (1999) examined the relationship between social comparison 
processes and the Big Five personality factors as operationalized in the FFPI in a 
sample of 112 patients with various forms of cancer. Personality, social comparison 
processes, and physical and psychological (depression, uncertainty, and mastery) 
well-being were assessed at the beginning of treatment; physical and psychological 
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well-being were assesed again at the end of treatment. As expected, Neuroticism 
(Emotional Unstability) was associated with downward identification, i.e. identifi- 
cation with fellow patients who were doing worse. And, as expected, Extraversion 
was associated with both upward and downward identification, indicating that 
extraverts identify themselves with others more than introverts. Unexpectedly, 
however, Extraversion was most strongly associated with upward contrasting, i.e. 
comparing oneself to others who are doing better ("When I think about others who 
are doing better than I am, I sometimes feel frustrated about my situation", see Van 
der Zee et al., ibid.). But this, initially counterintuitive, finding could be explained 
also by a general tendency of extraverts to be oriented to the external world: to 
compare to others and strive for what others have achieved (cf. ambition). Neuroti- 
cism showed significant relationships with psychological well-being at the end of 
treatment, even after controlling for the level of psychological well-being at the 
beginning of treatment. A relationship between Neuroticism and psychological well- 
being is a well established finding (Costa & McCrae, 1992; Matthews, Saklofske, 
Costa, Deary, & Zeidner, 1998). 

To summarize, most authors (Adema et al., 2000; De Fruyt, 1997; Nauta & San- 
ders, 2000; Van der Zee et al., 1999) experienced some unexpected results. Still, all 
found meaningful relationships, providing evidence for the construct validity of the 
ЕВРЕ 


Cross-cultural generalizability 


The FFPI has been translated into 17 languages as yet. Hendriks, Perugini and others 
(2001) examined the cross-cultural generalizability of the FFPI in 13 European and 
non-European nations. The data set encompassed representatives of the Germanic 
(Belgium, England, Germany, the Netherlands, USA), Romance (Italy, Spain), and 
Slavic (Croatia, Czech Republic, Slovakia) branches of the Indo-European langua- 
ges, as well as Semito-Hamitic (Israel) and Altaic (Hungary, Japan) language 
families. All samples except the smallest one (N = 97) showed clear five-factor 
structures. High congruence coefficients were found between each sample structure 
and Dutch and American large-sample reference structures (mean congruence values 
$ are given in Table 5). More than 80 per cent of the items were equally stable 
within each country. The internal consistencies of the five factors were generally 
good, as is illustrated by the mean as after procrustes rotation to the Dutch reference 
structure (.87, .84, .84, .84, .79, for I to V), given in Table 5. 


The FFPI in practice 


The FFPI can be used for self-ratings and other-ratings equally well. In case of self- 
ratings, the third-person item format invites the subject to take an objective perspec- 
tive. Completion of the FFPI will take a subject 10-15 minutes. The FFPI can be 
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administered in private or in classroom settings. Its use is restricted to psychologists 
and other disciplines licensed for psychodiagnostics. 


Subjects and samples 


The FFPI is suitable for subjects of approximately 12 years of age and older (Szir- 
mák, 2001). An education completed at the level of primary school should be 
sufficient. АП items of the FFPI fulfill the criterion of scoring below 1.07 on a 3- 
point scale ranging from 1 (perfectly comprehensible) to 3 (totally incomprehen- 
sible), as judged by 45 students from a local school for lower professional training, 
Groningen, the Netherlands. 

In the oldest age group, subjects may need personal assistance in the correct use 
of the rating scale: some people in homes for the elderly tended to answer the items 
with "yes" or "no" (Scheirs, Vingerhoets, & Hendriks, 1997). A good criterion for 
whether or not assistance is needed might be a person's living circumstances: if 
subjects are able to take care of their daily living themselves, they should also be 
able to complete the FFPI without any assistance. 

The FFPI is primarily meant for use in samples from the normal population. Pre- 
liminary findings suggest that the FFPI is reliable and valid also in samples of 
patients with physical complaints (Hendriks, 2000). Concerning its use with patients 
with psychiatric complaints, additional research is needed (Hendriks, Ostendorf ef 
al., 2001). 


Instructions 


Instructions are the same for self- and other-ratings: "The enclosed list contains 
personality traits. Please fill in behind each trait the amount to which this trait is 
applicable to the above named person. Choices are: 1 (not at all applicable), 2 (little 
applicable), 3 (only moderately applicable), 4 (largely applicable), 5 (entirely 
applicable). (Followed by an example). “If you are in doubt, please compare the 
person to be rated with others you know well. Please don't skip traits". In case of 
self-ratings, "yourself" is mentioned as the "person to be rated". 

For (non-psychiatric) patients, it is advised to let the general instructions be pre- 
ceded by a situation-specific instruction, such as the following for hospital patients 
(self-ratings): "Please let your answers nor be influenced by your current condition 
resulting from your illness or any other reason for having been hospitalized. What 
this questionnaire is about is how you came to know yourself in the course of your 
lifetime." (Hendriks, 2000). 
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Scoring 


The ЕЕРІ is scored according to rotated principal components (Hofstee er al., 1998), 
which means that item weights are used to calculate a person's (five) uncorrelated 
factor scores from his or her 100 item responses. In research settings, a paper-and- 
pencil version of the FFPI is probably the one most widely used. Still, computerized 
scoring can and should take place; it takes less than two minutes to enter the 100 
item responses onto a computer. As mentioned before, a stand-alone (Pascal) scoring 
program is obtainable from the first author, but for scientific research purposes only. 
This scoring program makes use of the matrix of item weights B established in a 
large and representative Dutch sample (N = 2,494). In the near future, this B will be 
replaced by the matrix of item weights established in a cross-national validation 
study in 13 countries (Hendriks, Perugini er al., 2001). If a data set contains missing 
values, the scoring program substitutes a person's mean score (rounded to the 
nearest integer value) across the nonmissing items, separately for each factor pole, 
prior to calculating factor scores. To a total of 50 per cent of the item responses (5 
per factor pole) may be missing; however, with an increasing number of missing 
item responses, factor scores should be interpreted with caution. 

A computerized version of the FFPI or mail-in scoring service for the paper-and- 
pencil version will become available from a commercial publisher upon completion 
of the manual in a particular language, which is, up to now, only the case in the 
Netherlands. Then specific information like facet scores can also be easily obtained. 
A disadvantage is that, once commercially published (like in the Netherlands), the 
FFPI can no longer be obtained for free. Note, however, that the utility of facet 
scores lies in the applied context only. Facet scores (e.g., М+1+) contain no specific 
variance over and above the variance accounted for by the two pertaining factors 
(i.e. Autonomy and Extraversion). Thus, for correlational research purposes, the five 
factor scores suffice. 


Anchored scores 


The FFPI-scoring device produces compatible anchored factor scores (Hofstee & 
Hendriks, 1998). These are standardized scores anchored at the scale midpoint: they 
preserve absolute information. Whereas regular or standard factor scores are cente- 
red at the mean of the population (M = 0, SD = 1), the mean of compatible anchored 
factor scores (anchored scores) in a population may deviate considerably from zero, 
in а positive (socially desirable) direction (M > 0, SD = 1). The difference may be 
illustrated by the following example. Imagine a person who fills in a set of Extraver- 
sion/Introversion items and scores on average "3.5" on a 5-point scale (Introversion 
items reflected). If the mean score of his/her fellow-subjects is higher, our subject 
would be reported to be slightly introverted (below the mean), relative to the others. 
With anchored scores, our subject would be reported to be slightly extraverted 
(scoring slightly above the scale midpoint). Compatible anchored factor scores and 


The Five Factor Personality Inventory 101 


regular factor scores show the same factor structure: they are linear transformations, 
differing by only a constant, namely the mean anchored score (for details, see 
Hofstee & Hendriks, 1998). In other words, the two types of factor scores do not 
make a difference in findings in correlational research. The surplus value of ancho- 
red factor scores lies in the applied context. 


Contexts 


Like any other personality questionnaire, the FFPI can be used for (1) scientific 
research at the level of groups of people and (2) diagnostic purposes at the level of 
the individual. Concerning the latter, applied, purposes, one should think primarily 
of contexts in which it is in the interest of the client to be well advised, like in 
counseling or educational and vocational guidance; much more than in the former 
contexts, in the selection context convergent information from other measures (e.g., 
assessment center) is needed. In clinical practice, one may think of matching treat- 
ments to personalities (Miller, 1991). 

In clinical contexts, the need for additional ratings next to a patient's self-rating, 
e.g., from family members or others who know the patient well, might be self- 
evident. But also more generally, mean scores across a number of raters are to be 
preferred to one self- or other-rating, because averaging across raters reduces error 
variance (Hofstee, 1994). 

An application that explicitly asks for a number of raters is the assessment of 
"ideal" personality profiles, e.g., for specific professions, treatments, or occupations. 
The procedure would be that 5 to 10 experts complete the FFPI with the ideal 
student, patient, applicant, or whatever is the focus of interest in mind. Ratings are 
averaged across raters. Such profiles may help in clarifying expectations, conditions, 
or preconceived opinions on the part of the provider (educational institute, inpa- 
tient's or outpatient's clinic, employer). They may thus offer additional information 
(e.g., to students) or a point of departure for further discussion (e.g., definition of job 
requirements). When relevant, empirical and ideal profiles could be matched, e.g., 
by calculating their Euclidian distance in five-space (Hofstee, 1999): пе ој + 
(II, - Ue}? + (HI - Ш.р + ПУ, - 1.) + (V, - Vel’), with subscripts “е” denoting the 
individual’s factor score and “i” the ideal factor score. 


Translations of the FFPI 


The FFPI has been translated into the following languages: Brazilian-Portuguese, 
Chinese, Croatian, Czech, (American- and UK-)English, German, Greek, Hebrew, 
Hungarian, Italian, Japanese, Norwegian, Polish, Rumanian, Slovakian, Spanish, and 
Swedish. Colleagues who would be interested in one of these versions or in produ- 
cing a translation into a language not yet mentioned may contact the first author. 
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Existence of norms 


As yet, norms (Hendriks er al., 19992) аге available in the Netherlands only. They 
can be obtained from the publisher. 


Conclusion 


The psychometric properties of the FFPI in normal population samples are well- 
established by now. Reliability and construct validity proved to be good to excellent 
and were found to be remarkably stable across a variety of samples in many dif- 
ferent countries. Preliminary findings in patient samples are promising. Certainly, 
additional data need to be collected on predictive (criterion) validity. Furthermore, 
the instruments’ validity in specific settings (e.g., the selection context) needs to be 
explored. The FFPI is a relatively “young” instrument and building the nomological 
network takes time. Given the results so far, however, our expectations are high. 
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Chapter 5 


Studies of the Big Five Questionnaire 


Claudio Barbaranelli 
Gian Vittorio Caprara 


Introduction 


The Big Five Questionnaire has been developed for the assessment of the Big Five 
Model of personality traits (BFM) using a rational-based, or “top down" approach, 
that moves from theoretically defined personality dimensions to identify the appro- 
priate items representing measures of them (see Burisch, 1984). According to this 
approach, once the Big Five were identified as the high-order, most recurrent factors 
of personality, facets or subdimensions were identified from a scanning of the perti- 
nent literature, and sentence-items were produced to assess these constructs. Com- 
pared to other sentence-based questionnaires developed for measuring the Big Five 
(e.g.. NEO Personality Inventory, Costa & McCrae, 1985; 1992; Hogan Personality 
Inventory, Hogan, 1986), the BFQ presents some special features. First, it represents 
a relatively parsimonious and economical measure of the Big Five in terms of num- 
ber of facets referred to in each primary dimension, and in terms of number of sen- 
tences produced. Second, it includes a scale designed to measure social desirability. 
Furthermore, a major distinctive feature with regards to the Big Five "orthodoxy" 
concerns the definition of Factor I. Although the more common label for the first 
factor of the Big Five has been "Extraversion", we preferred to use the label “Еп- 
ergy". This claim does not repose on a mere linguistic preference, rather it derives 
from a close scrutiny of the adjectives found under Factor I in psycho-lexical studies 
(e.g., Caprara & Perugini, 1994; Goldberg, 1990; 1992; see also John, 1989; 1990), 
as well as from careful consideration of the meanings actually conveyed by the 
words "extraversion" and "extravert". Adjectives such as active, dynamic, energetic, 
lively, vigorous are usually found under Factor I. Moreover, in Webster's (1974) 
unabridged dictionary of American English one may find under extraversion “an 
attitude in which a person directs his/her interest to phenomena outside him-herself 
rather than to his/her own experience and feelings" (p. 652), and under extravert "a 
person whose interest is more in his/her environment and in other people than in 
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him-herself, a person who is active and expressive" (p. 652). 

While it still holds that few personality constructs have remained as controversial 
and as productive of research over the years as extraversion-introversion (Carrigan, 
1960), we believe that research does not benefit from a dimension which often does 
not distinguish "ease in interpersonal relationships" from "activity, vigor, energy". 
Ultimately, "Energy" seemed more appropriate to convey the character of vigor, 
activity and strength of Factor I, as well as to capture important aspects of personal- 
ity that other taxonomies have considered under the labels of "activity" and "level of 
activity" (see, e.g., Buss & Plomin, 1984; Comrey, 1980; Guilford, 1975; Hogan, 
1986; Murray, 1938; Strelau, 1983; Zuckerman, 1994; Zuckerman, Kuhlman, 
Thornquist, & Kiers, 1991). Research on the biological basis of personality suggests 
the importance of the distinction between temperamental-constitutional features and 
social features of the construct of Extraversion (Eaves, Eysenck, & Martin, 1989; 
Eysenck, 1990). Likely, such a distinction would help clarify the patterns of genetic 
and environmental correlates found in Extraversion as well as its psycho- 
physiological correlates (Eaves et al., 1989). Accordingly, the BFQ Energy factor 
was defined to emphasize the dynamic-energetic aspects of Factor I, and to mark 
better its distinction from Factor II. However, this emphasis did not underscore the 
interpersonal features of Factor I, that are still considered in many of the items in- 
cluded in the Energy scales, and are explicitly acknowledged in one of its facet 
scales, namely Dominance. 

The BFQ contains five domain scales and ten facet scales, plus a Lie scale de- 
signed to measure social desirability and the tendency to distort meanings of the 
scores. Table 1 presents short definitions of the domain and facet scales of the BFQ. 


Table 1. Domain and facets scale of the ВЕО 


Domain Scales Facet Scales 


Energy: Level of activity, vigour, 
sociability, talkativeness, assertiveness and 
dominance, competitiveness, leadership 


Dynamism: Activity and enthusiasm 
Dominance: Assertiveness and self-confidence 


Friendliness: Friendly comptience vs. hos- 
tility, trust, prosocialness 


Cooperativeness: Altruism, empathy, generosity, 
unselfishness 
Politeness: Kindness, civility, docility and trust 


Conscientiousness: self-regulation in both 
its proactive and inhibitory aspects 


Scrupulousness: Dependability, orderliness and 
precision 

Perseverance: Capability of fulfilling one’s own tasks 
and commitments, tenaciousness, persistence 


Emotional Stability: Capability of 


| Emotion Control: Absence of anxiety, depression and 
controlling one's own emotional reactions, 


vulnerabilility, mood stability 


absence of negative affects, psychological 
well-being 


Openness: Broadness of cultural interests, 
tolerance toward differences, need and 
search for novelty 


Impulse Control: Capability of controlling irritation, 
discontent, and anger 


Openness to Culture: Intellectual curiosity, interest 
in being informed, appreciation of culture 
Openness to Experiences: Openness to novelty, 
tolerance of values, interest toward diverse people, 
habits and life-styles 
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Table 2. Rotated factor matrices (Comrey Tandem Criteria ll) in the Italian normative sample 
———————— — — UM" ee 


е Factors 

BFQ Facet Scales E F € S о 

Dynamism .76 .24 .03 .00 .26 
Dominance ‚60 -.27 ‚21 ‚01 ‚18 
Cooperativeness .05 .63 .12 .00 .26 
Politeness .05 zi -.04 .18 .09 
Scrupulousness -.07 .00 .74 .00 -.05 
Perseverance .46 .06 251 .08 .24 
Emotion Control : .16 -.04 .03 . 80 .08 
Impulse Control MO 34 — .06 .82 .03 
Openness to Culture .06 .10 .28 .07 ‚64 
Openness to Experiences .26 .18 -.05 .03 ‚69 
% of explained variance 125 11915) 9.5 13.6 11.3 


Note. Е = Energy; Р = Friendliness; С = Conscientiousness; S = Emotional Stability; О = Openness; № = 
9,333. 


Each facet scale contains 12 items; in order to control for possible acquiescence 
response set half the items are positively phrased with respect to the scale name and 
half are negatively phrased. The Lie (L) scale contains 12 items which are all posi- 
tively phrased. For each of the 132 items in the questionnaire, the respondent has a 
5-choice answer scale that ranges from complete disagreement (1 = very false for 
me) to complete agreement (5 = very true for me). 

The psychometric properties of the BFQ have been repeatedly validated in Italian 
samples (see Caprara er al., 1993; Caprara, Barbaranelli, & Livi, 1994). Table 2 
summarizes the factorial structure of the BFQ on the normative sample that com- 
prises 9,333 subjects (50.5 % females, 49.5 % males) aged from 16 to 84 years old 
(M = 38.5, SD = 15.7). This solution was obtained by the least-squares principal 
factor method as implemented in the Comrey program for factor analysis (Comrey, 
1973; Comrey & Lee, 1992). The scree test of eigenvalues (Cattell & Vogelmann, 
1977) was used as a tool for establishing the number of factors that were present in 
the solution. The factors were rotated using the Tandem Criteria for orthogonal 
analytic rotation (Comrey, 1967). In comparison with other rotation methods the 
Tandem Criteria permits one to obtain factor solutions which are more "clean" and 
more fitted to the observed correlations among variables (Lee & Comrey, 1979). 
The solution presented in table 2 is in line with what expected: all the BFQ facet- 
scales present high loadings on the intended factor, and low loadings оп 
the other factors, with the exception of Perseverance. Finally, Cronbach's alpha val- 
ues of the different scales ranged from .74 to .90. 


Cross-Cultural validity of the BFQ 


The BFQ has been translated into different languages. While French, Spanish, and 
Slovenian versions have already been published, validation studies are in progress 
for German, English, Dutch, Swedish, Czech, Greek, and Hungarian. 
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The generalizability of the BFQ across four different countries (Italy, USA, 
Spain, and Germany) was proved in a recent study where multiple multivariate 
methods such as Simultaneous Component Analysis, Exploratory, and Confirmatory 
Factor Analyses were used (Caprara, Barbaranelli, Bermudez, Maslach, & Ruch, 
2000). In this study, two kinds of analyses have been performed: An analysis at the 
level of the single sentences that are used to assess the constructs (item-level analy- 
sis), and an analysis at the level of the aggregates of items that define sub- 
dimensions or facets for the Big Five (scale-level analysis). The results of the item- 
level analysis evidenced that the items were greatly congruent in measuring the 
same constructs across countries (i.e., structural equivalence), although they showed 
some differential functioning in the different countries (i.e., item bias). In the same 
study, scale-level analyses showed that the Italian, American, German, and Spanish 
versions of the BFQ have factor structures that are fully comparable. Because the 
pattern of relationships among the ВЕО facet-scales is basically the same in the four 
different countries, different data analysis strategies (Simultaneous Components 
Analysis, Exploratory Factor Analysis, Confirmatory Factor Analysis) converge in 
pointing to a substantial equivalence among the constructs that these scales are 
measuring. 

Table 3 shows results from factor analyses on the 4 different samples. These 
analyses have been carried out on the facet scales using all the items that turned out 
not be affected by item bias. Congruence coefficients (Tucker, 1951) among the 
different solutions ranged from .94 to .99. Cronbach's alpha ranged from .74 to .85 
in the American sample, from .73 to .87 in the Spanish sample, and from .65 to .85 
in the German sample. 


Construct validity of the BFQ: Correlations with other 
instruments 


The construct validity of the BFQ has been examined in various studies by correla- 
ting its scales with: a) the scales of other Big Five markers such as the NEO-PI 
(Costa & McCrae, 1985) and the NEO-PI-R (Costa & McCrae, 1992); b) the scales 
of questionnaires measuring concurrent personality taxonomies such as the Eysenck 
Personality Questionnaire (ЕРО; Eysenck & Eysenck, 1976), the Comrey Persona- 
lity Scales (CPS; Comrey, 1995), Cattell's 16PF — Form A (Cattell, Eber, & Tatsu- 
oka, 1970), the Multidimensional Personality Questionnaire (MPQ: Tellegen, 1982); 
C) scales measuring important criterion variables such as the Wechsler Adult Intelli- 
gence Scale (WAIS; Wechsler, 1981). and the scales of perceived psychological 
Me ode, (PPWB; Ryff, 1989). Table 4 summarizes the findings obtained in these 
studies. 


113 


ionnaire 


Questi 


ive 


ig F 


ies of the Bi 


Stud 


"рајишо џгед әлец squiod ewQ (7° 
= №) Kueuijao = 439 '(862' | = N) uteds = vds '(LpZ = №) seyeys POLUN = YSN '(pze'z = М) Aie = WLI ‘S2107; ut әле 107984 aures ay} оз BuLaja4 s3a2ej Jo SBUIpeo] :әзом 


60! 94) ЕО 6и БРЕ veh pel vtl 58 8 560: 66 t6 Vii EOL 9'0 Fu 96 VEL 9t! 
aoueueA paute|dx3 % 

89 29 147 #9 50- £0 00 £0- bl- 10 0 t0- 0l EL SZ (4: Sz 9t oz EZ 
sa»uauadx3 ој ssauuado 

£9 65 [24 49 80 L0- ZO 90 St 02 OL EZ 80- 0 01- 60- ŁO Юю t0 &- 
3Jnj]n) оў ssəuuədo 

£0 L0- 10 00 +8 18 64 GL 10 и 10 EL 50 140) vi и [A 9t- Bi- £i- 
1023009 asjndwy 

50 и) 90 РО £8 18 18 58 Юю kO- čl- 0 £0 LO Vie 05 у} 90 8c cl 
1013007 иоцошз 
EL gi bL 6% zo 140 0l 10 45 rS 19 kk bh 90 £0- tO- (74 8z [43 SE De 
(ООС ШЕЛ 06 по 90 £0 тада t9 % о 90 о | о жю t- tk eae rans 
ю- t0- 60- 10 £0 РО 60 60 90- 00 РО 10- 4S 69 e 49 £0 ¥0- bh bt $5әцәз!]од 
ЈЕ ГАЗ 8 Wu S0- 807 SO £0 00 6 ю- д 49 cS IS 09 10 6 SO 146 ssauaAnJedoo? 
LO G0- +O- 0 0 Ош 70: 0 oz 60 8t 8l nme do ZE: 99 9 4S £9 азџешшод 
10 8l 80 [44 £0 70- 0 00 80- 0 [40] 10 LE 144 LZ v 29 05 92 49 uisiueuÁq 
X39 vds УП VLI X39 VdS YSN УП 439 vdS YSN Vil X39 vds YSA VII u39  vdS YSA VLI ayes 392? 4 048 

ssəuuədo Ку део jeuorour SSOUSNOIUSIISUO) ssaulpuau 4 K819u3 


.. —. m EPIO) t weno ruego вргрнз нанеле RN] EEE EN 
sajdures иешләгу pue usiueds ‘шезџәшу ‘venez ur (иоцејол шиду Aq ромођој 8uu012e sixy jedi»uud) зезијеш изәздеа pajejoy `Є ә]де1 


114 Big Five Assessment 


Table 4. Correlations of BFQ domain scales with scales of other instruments 
__ =ош ————— E. m 


BFQ scales 
2 $$ Sr 5 5 
са = Ug = Е 8 
U 59 t2 28 2 o 
т є@ 95 ui o тг 
МЕО-РІ (N = 288) 
N Neuroticism 2.3727 -.29** -.17** -,80** -.21** -.28** 
E Extraversion 71^* 29'* 2137 ns neler 7123 
O Openness to Experience 227: 23355 ns ns 26552 ns 
A Agreeableness ns .66** ns 22025 Sue 182° 
C Conscientiousness 23155 ns 16352 112° ns уе 
NEO-PI-R (N = 695) 
N Neuroticism ns „152° as -.78** ns -.18** 
E Extraversion 26745 -.28'* 323" -.23** 723: ns 
O Openness to Experience 6 .09* ns -.16** .68** -.16** 
А Agreeableness .36** 2502 ns 281252 23056 3115s 
C Conscientiousness .09* ns 37:356 -.19** ns .28"" 
ЕРО (М = 186) 
P Psychoticism ns -.26** -.36** ns ns ns 
E Extraversion .63** 23955 ns 22325 22 655 „36° 
N Neuroticism 222525 -.21* ns - 757" 21° -.26** 
L Lie Scale ns .29** 23025 73258 ns 25055 
CPS (N - 288) 
T Trust ns .47** ns .24** ns ns 
O Orderliness -.13* ns .47“ ns -.26** 18 
C Social Conformity ns ns ns ns 281155 „21 
A Activity .66** 22055 ,42** 32722 232** oiler 
5 Emotional Stability .34** 2655 ns s iss .14* 2355 
E Extraversion .74“ .24** .12* сга gel oe .24** 
M Masculinity 2202 ns ns oos 1955 .16** 
P Empathy ns „ЭЛТ П$ 12° 5195 ns 
R Response Bias .20** ns ns 22 One ns .16** 
V Validity ns .34** 2155 .18* ns BTE 
MPQ (N = 375) 
NA Negative Affect ns -.30** ns -.48** ERIS ns 
PA Positive Affect .66** 2274" 23625 ns .29** 12525 
CT Constraint -.14* ns .43** 157 -.22** S 
16PF (М = 608) 
A Outgoing 2158 1955 .14** ns ns .08* 
B Bright ns ns .10** ns .14** -.14" 
C Emotionally Stable .34'* 19° 217** .48** 20° 26" 
E Assertive .46** -.19** ‚10** -.17“ 11355 156 
F Happy-go-lucky 25115" .28** 106 ME mg ns 
G Conscientious .08* sitat 435 .08* ns .19“ 


Н Venturesome 6355 .26** 1925 32 Т1 bye 
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| Tender Minded -.24** 2055 ns -,10** .09* ns 
L Suspicious ns -.26** ns -.30"* ns 32:5 
M Imaginative ns Sei ns 137" 220755 ns 
N Shrewd -.20** ns ns ns -.18** ОШ 
О Apprehensive 2372 2511699 = 2158 -.59** -.20** 202222 
Q1 Experimenting 16: ns ns Sul les 22075 eui bes 
Q2 Self-sufficient 202322 -.30** З= УЫ ns ns 
Q3 Compulsive 12° „10° 395 213225 ns 227** 
Q4 Tense -.24** = 25% ze bye = 255 -.18** 2375" 
FB Fake Bad = 1855 -.31*^* 1525 -.24“ +. 217: ns 
FG Fake Good .24** 1182 ЗА :38:* 1225 .30** 
WAIS (N = 76) 
IQ-V Verbal IQ ns ns ns 2235 .40** ns 
IQ-P Performance IQ ns ns -.23* ns УЛА ns 


PPWB (М = 985) 


Dominance-Satisfaction З= .08* „ЗО 235^ 21555 23752 
Positive relations 22295 25025 MAS 24“ 23022 ns 

Autonomy #30" -.14** 12° 27555 13255 20955 
Self-acceptance 2 .08* .19** .44** 2262€ sse 
Search for Novelty 14325 :225* {722° 22255 45 -.09** 
Sense of growth 0720025 SIS .28** .08* 265 .08* 


Note: * p < .05; ** p < .01; ns = statistically non significant. 


As can be seen in Table 4, BFQ scales showed a clear-cut pattern of correlations 
with the scales contained in the various instruments examined. In particular, Energy 
was positively and highly correlated with the various extraversion scales, but also 
with scales related to activity, assertiveness, venturesomeness, and positive affect. 
Friendliness was positively correlated with scales related to agreeableness, trust, 
empathy, and positive relations with others. Conscientiousness was positively cor- 
related with scales related to conscientiousness, orderliness, activity, and constraint. 
Emotional Stability was negatively correlated with the various neuroticism scales, 
and with scales measuring apprehensiveness, tension, and negative affectivity, but 
was positively correlated with scales measuring emotional stability and self- 
acceptance. Openness was positively correlated with scales measuring openness to 
experience, search for novelty, and with verbal IQ. Finally, the Lie scale was posi- 
tively correlated with the other social desirability scales and with response bias 
scales. These results confirm both the convergent and the discriminant validity of 


BFQ scales. 


Construct validity of the BFQ: Multitrait-Multimethod 
analyses 


In a recent multitrait-multimethod (MTMM) study Barbaranelli and Caprara (2000) 
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Table 5. Rotated factor matrix of BFQ - Other ratings (Varimax orthogonal rotated solution) 
a —————_-—_-——— 


Factors 5 
BFQ Facets Scales E F C $ 
DRE У 977 .29 .03- .01 :25 
Dominance .69 -.25 712 -.06 ‚18 
Cooperativenes .07 .76 15 ‚04 ‚23 
Politeness -.02 ‚65 .01 .32 .02 
Scrupulousness -.04 ‚03. ‚77 .01 .08 
Perseverance .40 .18 .58 -.03 .18 
Emotion Control .07 .05 -,04 „73 ‚05 
Impulse Control | -.19 .28 .09 .88 .01 
Openness to Culture .16 .06 .37 .08 .51 
Openness to Experiences .33 122 .05 .01 .62 
% of explained variance 14.0 13:2 11.2 14.2 8.4 


Note: E = Energy; Е = Friendliness; С = Conscientiousness; 5 = Emotional Stability; О = Openness; № = 
1,200. 


examined the construct validity of the Big Five using the BFQ and the so called 
BFO! (“Big Five Observer", Caprara, Barbaranelli, & Borgogni, 1994), an adjective 
based measure of the Big Five. Both instruments were used for collecting self-report 
and other-ratings. In particular, 200 subjects (100 males and 100 females) described 
their own personality on the BFQ and on the BFO, and they were rated by 6 ac- 
quaintances (for a total of 1200 other-ratings) with the same instruments. Before 
describing the results of the MTMM study we believe it 1s useful to present some 
results related to psychometric characteristics of the BFQ — other-ratings, as well as 
of the BFO. 

Table 5 presents the results of an exploratory factor analysis conducted on the 
BFQ — other-ratings with the aim of examining its internal validity. The expected 
factor structure was confirmed, Cronbach's alpha coefficients of the scales ranged 
from .77 to .89, Tucker congruence coefficients with the BFQ — self-report factor 


' The BFO has been designed to measure the personality traits comprised by the five factor model 
by means of 40 pairs of bipolar adjectives (8 pairs for each of the 5 factors). The adjectives were 
selected from the psycholexical study in the Italian language conducted by Caprara and Perugini 
(1994), who identified a list of 492 Italian adjectives as being useful to describe personality. The 
most representative adjectives for each factor were identified by three expert judges using a "clus- 
ter sampling" approach (see Goldberg, 1992). Then, for each one of the five factors the eight adjec- 
tives presenting the higher corrected item-total correlation coefficient were selected, and opposite 
adjectives for them were identified using a Thesaurus of the Italian language (Gabrielli, 1977). Fol- 
lowing Goldberg and Kilowski (1985) we believed that bipolar adjectives could clarify the intrinsic 
ambiguity of several terms by specifying the dimension being measured, so that no terms in a pair 
of opposites would be interpreted in isolation and idiosyncratically. Each of the 40 pairs of opposite 
adjectives was rated on a 7 point scale. To control for acquiescence, the first presented adjective 
in half of the pairs goes in the direction of the dimension measured (e.g., Dominant /Submissive), 
while in the remaining pairs the order of presentation of the adjectives is reversed (e.g., Intro- 
verted/Extraverted). 
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Table 6. Factor structure for the BFO Self Report and Other Ratings 


Self-Report Other Ratings 
| || ni IV V | | 11 IV у 
Energy ee тш 
Dominant/ Submissive Q5" Te 7966" УМУ .-34* o 07533  .42 .48 
Introverted/Extraverted СОЛИ 06 * 70 05. —21 —.07 06 R A 
Leader/Pawn 04.11 .40 .49 -28 -01 .08 -21 .42 .53 
Bold/Timid 09 .03 .61 .31 -.18 .08 -.07 -.08 .60 .35 
Retiring/Sociable 03 .01 .66 -01 .40 10 .01 .42 .65 .04 
Silent/Talkative ОООО И Зи 01 0035 "68.09 
Energetic/Unenergetic 06  .41 .49 .18 -02 .01 .45 -01 .44 .27 
Clumsy/Self-Confident 14 .16 .62 .28 .06 .14 .07 .09 .65 .24 
Friendliness 
Cold/Warm 251309 E O "ОТТО А 604 0220 52 26 ООО 
Selfless/Selfish 06202010505 220905959746 11917 155 02 13 
Hostile/Friendly 10 -01 .30 .04 .62 .14 .08 .65 .34 01 
Trusty/Suspicious ЗОО 0033 29 -05 39 18 16 
Rude/Gentle 18 "51203 06) 762 24 29 6з 04-02 
Indulgent/Severe 21 -.21 -.09 -.14 .40 ..30 -.12 .46 -.07 -.02 
Tolerant/Intolerant 34 -.06 -.08 .06 .46 .38  .01 51 -.13  .05 
Unfair/Fair (Ою 23 04 5m 52 02 2/ .54 07 15 
Conscientiousness 
Careful/Careless 02 ӨЗ 05" 09 - 01. 12 59 .03 -06 .15 
Scrupulous/Lax -05 .68 -.06 .07 -.06  .01 .67 -03 -.08 .12 
Well Organized/Disorganized 04  .70 .01 -.12 -07 .21 .63  -.01  -.14  -.04 
Lazy/Hard-working 09 62 .34 -07 .11 10 64 16 33. -.08 
Tired/Tireless .16 .44 .47 -02 .10 .07 .53 .09 .45 -.06 
Weak willed/Self-disciplined "ОВИ ОУН 28 ОТ 23 02 67 26 20 06 
Negligent/ Conscientious .05 A О АУН АО 51 .02 .09 
Undependable/Reliable 07 .48 -09 .19 .29 .07 .46 .31 -05 .22 
Emotional Stability 
Anxious/Serene JL 009 11 09 3:903 — 73. -.03 ММ 24 02 
Stable/Instable 50 .40 -01 .12 -01 .56 .31 .04 -.04 .16 
Patient/Impatient 253285 18 -13 -05 279 .58 .19 .32 -15 -01 
Nervous/At ease 82 fo) ooh] 01 5 81 04 .6 Об ено 
Relaxed/Tense „БОШОО ОЛИ - 02. 77 -02_ 0266046705 
Satisfied/Unsatisfied 47 18 .24 .11 .04 46 .19 10 .23 .21 
Worrying/Calm 12800850305-:06 05027 174505791507 ла 02-09 
Vulnerable/ Resistant .48 .21 .31 .06 -06 .41 .25 -03 .37 .07 
Openness 
Original/Conventional 09 cals su ЖЧӨТ ee, 02 -07m 05 1464 
Unintelligent/Intelligent "NE 412 ОВЕ 54) m24 - 10622 36 08h 388 
Not Receptive/Receptive -40 .08 22 .40 .20 -09 22 .32 22 .50 
Informed/Uninformed 144.148 .01 .57 -01 .13 .28 .09 -.04 758 
Creative/Uncreative 2102880722515 589875222909 2250722215 БЕ 09 I) 57 
Dull/Sharp ЭО Оз БАН ЗЕ ONE 31 132 08.36 
Innovating/Traditional КОБЕ БН 1166468 02 050107 07 — 12 .63 
Uncultured/Cultured - 01 .09 -.06 .65 .21 -.03 .24 .27 -.01 .50 
% of Explained Variance vU Cb V SLT 8 ПОЕ 09 986 936680 


Note: These are Varimax-rotated principal components for 1,576 subjects (self-report) and 1,350 
raters (other ratings); Loadings on the intended factor are boldface. 
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structure ranged from .97 to .99, and the correlation coefficients with the same 
scales of the BFQ - self report ranged from .53 to .72. 

Table 6 presents the factor structure of the BFO in self-report and in other ratings. 
Congruence coefficients among these two solutions ranged from .88 to .95. Cron- 
bach's alpha reliability coefficients ranged from .69 to .82 in self-report, and from 
72 to .83 in other-ratings. Correlations among self-report and other-ratings ranged 
from .47 to .70. 

The aforementioned multitrait-multimethod study matched two "response mode" 
(sentence based questionnaire and list of adjectives) and two sources of information 
or raters (self report and other ratings). The 20x20 multitrait-multimethod matrix 
derived from these measures has been analyzed via Structural Equation Models 
(Bollen, 1989) according to the criteria proposed by Widaman (1985), Marsh (1989) 
and Bagozzi (1994): In particular, four different models have been compared. In all 
these models, global fit indexes (such as the chi-square statistic, the Comparative Fit 
Index, and the Root Mean Squared Residual; see Bollen & Long, 1993) resulted 
moderate; convergent validity was supported by the high loadings of scales measur- 
ing the same trait; discriminant validity was supported by low correlations among 
the different traits; method variance and error variance resulted moderate or low. 

Among the different models examined, one turned out to be particularly interest- 
ing. This model was called CFA-RARE (“Confirmatory Factor Analysis — 
Rater/Response mode" model) because it distinguished between two different kinds 
of method factors: rater (self and other) and response mode (questionnaire and list 
of adjectives). Table 7 presents a summary of parameter estimates for this model. 

Trait factor loadings were all significant, suggesting that the measures are good 
indicators of their respective trait factors, thus supporting high convergent validity. 
Loadings on the "response mode" method factors (i.e., questionnaire and adjective 
list) generally turned to be low except for Openness to Experience and for Friendli- 
ness: this means that the influence of response mode is relevant only for these two 
factors. Loadings on the "rater" method factors (self-report and other-ratings) evi- 
denced that the effect of self-report method factor was relevant for Energy, Consci- 
entiousness and Openness, while the effect of other ratings method factor was rele- 
vant for Friendliness and for Emotional Stability. Residual variance was significant 
but moderate for all scales, suggesting a low to moderate percentage of variance due 
to unique factors or to error of measurement. Finally, discriminant validity was con- 
firmed by low correlations between the traits, that were all lower than .30 with the 
exception of Openness to Experience with Energy (r = .54, p « .001), Emotional 
Stability with Friendliness (r = .32, p « .001), and Conscientiousness with Openness 
to Experience (r = .37, p « .001). 

From the results presented in Table 7 it is possible to partition the variance of 
each personality factor into variance due to the trait measured (which reflects the 
convergence of the four different scales used for measuring each trait), variance due 
to response mode, and rater method factors (which reflects the particular method 
used to assess the traits), and residual variance (which reflects a combination of spe- 
cific variance and measurement error). Overall, 66 per cent of Energy мапапсе was 
explained by the trait factor, 4 per cent by response mode, 8 per cent by raters, while 
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Table 7. Traits and methods loadings for the Rater-Response Mode model 


Traits Methods 
E А С S О Qu Ad Se Ot rv 


Questionnaire Self Report 


Energy .73 .16 .45 .23 
Friendliness .65 .33 .06 46 
Conscientiousness .69 .30 .30 .35 
Emotional Stability .88 .01 j ‚05 ‚23 
Openness .40 .69 ‚46 15 
Adjective List Self Report 
Energy .72 1 55) 232 
Friendliness .65 xW 00 .42 
Conscientiousness SA 22 .36 .29 
Emotional Stability .78 .26 .10 :32 
Openness ‚54 .17 .61 .31 
Questionnaire Other Ratings 
Energy .93 ‚20 -02  .10 
Friendliness .83 -.01 .41 .14 
Conscientiousness .B0 .29 .17 .24 
Emotional Stability .76 -.09 54 .14 
Openness 73) .41 du) — ge) 
Adjective List Other Ratings 
Energy .85 .26 2052724 
Friendliness .67 .45 40  .18 
Conscientiousness .80 .24 .20 .27 
Emotional Stability .67 .31 ashy oti) 
Openness А .72 .42 .14 .28 


Note: Е = Energy, F = Friendliness, С = Conscientiousness, S = Emotional Stability, О = Openness; Qu 
= Questionnaire, Ad = Adjectives list, Se = Self report, Ot = Other ratings, rv = residual variance; All 
coefficients are significant at .05 level or higher, except those in italics; Results are from the stan- 
dardized solution. 


22 per cent was residual variance; 50 per cent of Friendliness variance was ex- 
plained by the trait factor, 11 per cent by response mode, 9 per cent by raters, and 30 
per cent was residual variance; 57 per cent of Conscientiousness variance was ex- 
plained by the trait factor, 7 per cent by response mode, 7 per cent by raters, while 
29 per cent was residual variance; 60 per cent of Emotional Stability variance was 
explained by the trait factor, 4 per cent by response mode, 15 per cent by raters, 
while 21 per cent was residual variance: 38 per cent of Openness variance was ex- 
plained by the trait factor, 22 per cent by response mode, 15 per cent by rater, while 
26 per cent was residual variance. In general, trait variance was very high for all 
factors with the exception of Openness, method variance (i.e., response mode plus 
rater) was high for Openness and low for all the other factors. Residual variance was 
moderate. On average, 54 per cent of variance was explained by traits, 10 per cent 
by response mode, 11 per cent by raters, and 25 per cent was residual variance. 
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The personality of voters and of consumers 


The BFQ in the domain of political behavior 


In a recent study Caprara, Barbaranelli, and Zimbardo (1999) explored new relation- 
ships between basic personality profiles of voters and their choice of political party 
affiliation. The Italian political system has recently moved from extreme, ideologi- 
cally distinctive parties to complex coalitions (“Center-Left” and "Center-Right" 
coalitions). We found significant evidence for the utility of the Big Five Model of 
Personality in distinguishing voter party identification. More than 2000 Italian vot- 
ers who identified themselves as belonging either to "Center-Left" or "Center- 
Right" political coalitions differed systematically on several personality dimensions 
measured by the Big Five Questionnaire. 

In particular, using a MANCOVA design, significant relationships were found 
between three of the five factors of the Big Five scales and the political orientation 
of the participants in the investigation. Those voters identified as supporting the 
Center-Right showed higher scores than the Center-Left voters in Energy; F(1,1942) 
= 21.64, p < .001. In addition, they were also slightly higher on Conscientiousness; 
F(1,1942) 2 5.58, p « .05. The voters of Center-Left showed highly significant 
scores on Friendliness, F(1,1942) = 20.07, p « .001, as well as on Openness, 
F(1,1942) = 19.80, p «.001. No difference between these two groups of voters 
emerged for Emotional Stability, F(1,1942) = 1.44, p = .23. Figure 1 shows the per- 
sonality profiles of the two voter groups, expressed as T-scores for each of the five 
domains of the BFQ. It is noteworthy that the differences in personality traits due to 
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Figure 1. Personality profiles of Italian electors on the BFQ. 
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political partisanship are still significant after controlling for several demographic 
variables. 

Finally, through a logistic regression we examined the impact of personality traits 
in respect to political choice (center left vs. center right). None of the demographic 
variables considered (gender, age, and education) entered into the final regression 
equation, which was highly significant. The only variables that had significant im- 
pact on political preference were four of the BFQ domain scales, namely Energy (r 
= -.12, p< .001), Friendliness (r = .06, p < .001), Openness (г = .13, р < .001), and 
also Conscientiousness (г = -.05, p < .001). The equation allowed the correct classi- 
fication of 61.4 per cent of Center-Left voters, 57.6 per cent of Center-Right voters, 
with an overall hit rate of 59.5 per cent. These results are quite surprising since only 
personality had a significant impact on political preference, while none of the demo- 
graphic variables had an impact on party choice behavior once the personality ef- 
fects were partialled out. 


The BFQ in the domain of consumer behavior 


With the aim of extending the use of the Big Five to domains different from those 
usually considered in personality research, we started to investigate the role of the 
Big Five in respect to consumer behavior and habits. About 5,000 adults, represen- 
tative of the Italian national population, were administered a short version of the 
BFQ along with a survey-quesuonnaire aimed at investigating their consumption 
and purchasing habits and behaviors. We focused our attention especially on the 
following variables: a) the propensity to make one's own purchase in supermarkets, 
malls. hard discounts, and the like; b) the time spent in watching television; c) the 
books read in the past year; d) the topics preferred in newspapers; e) the propensity 
toward purchasing life insurances, and to invest into pension funds; f) the propensity 
toward savings (by means of Government stocks, stocks, investment funds, pass- 
books. bank current accounts). Background variables such as gender, age and edu- 
cation have been controlled for. 

The pattern of high scores on Friendliness, Openness, and Conscientiousness ap- 
peared to be associated with the propensity toward making one's own purchase in 
supermarkets, malls. hard discounts, and the like. This pattern reveals a profile of a 
consumer who is trustful, open to innovation, scrupulous, and methodical. The pat- 
tern of high scores on Conscientiousness and Openness and low scores on Emotional 
Stability was associated to the tirne spent in watching television. This pattern reveals 
a profile of a user who actively searches for novelty, is scrupulous but somewhat 
impulsive. The pattern of high scores on Friendliness and Openness was associated 
to the books read in the past year. 

The different topics preferred in newspapers corresponded to different personality 
patterns. In fact, subjects who preferred topics such as fashion, health, and the like 
showed high scores on Friendliness and Conscientiousness, but low scores on Emo- 
tional Stability. Subjects who preferred topics such as politics, culture, and eco- 
nomics showed high scores on Openness. Subjects who preferred topics such as 
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shows, sports, and comics showed low score an all the Big Five, but especially on 
Openness. 

The pattern of high scores on Energy, Openness, and Conscientiousness was as- 
sociated to the propensity toward purchasing life insurances, and to invest into pen- 
sion funds, revealing the profile of a consumer who is scrupulous, active, and positi- 
vely oriented toward novelty. Finally, a pattern of high scores on Friendliness, 
Openness, and Conscientiousness was associated to the propensity toward savings 
(by means of Government stocks, stocks, investment funds, pass-books, bank cur- 
rent accounts), revealing the profile of a consumer who is scrupulous, trustful, and 
positively oriented toward novelty. 

These results prove the utility of broad personality dimensions as indicators of 
consumer habits which may open new perspectives to marketers, as well as other 
contexts where personality variables might be useful for the prediction and explana- 
tion of behavior. The Big Five turned out to be substantially and differentially asso- 
ciated to diverse behavioral habits while controlling for background demographic 
variables such as gender, age, and education. These findings suggest that in the mar- 
keting domain a close scrutiny of personality patterns associated to specific behav- 
iors, habit, and preferences may be important for the identification of specific targets 
for specific products, as well as for a communication strategy that takes into account 
the different individual inclinations. 


Describing personality in late childhood and early 
adolescence 


To assess the Big Five in late childhood through self-report as well as through parent 
and teacher ratings, we constructed a new questionnaire, the "Big Five Question- 
naire — Children" (BFQ-C), consisting of 65 items, 13 for each of the five factors 
(Barbaranelli, Caprara, & Rabasca, 1998; Barbaranelli, Caprara, Rabasca, & Pas- 
torelli, 2001). 

Factor analyses on self-report and other ratings of elementary and junior high 
school children confirmed the expected Big Five structure. Factor solutions showed 
a high degree of congruence. There was a moderate although significant conver- 
gence between self-reports, parent, and teacher ratings. However, several differences 
emerged regarding the composition of the factors in the different solutions. In par- 
ticular, the two dimensions that presented the higher differences across the different 
data sets were Conscientiousness and Intellect/Openness, especially when consider- 
ing the rater and the age group. The two basic components of Conscientiousness 
(i.e., the proactive and the inhibitive; see McCrae & John, 1992) tended to emerge in 
separate factors, especially in self-report and teacher ratings of elementary school 
children. While the items related to the proactive component (perseverance and 
hardworking) tended to load on the Intellect/Openness factor, the items related to the 
inhibitive component (orderliness and scrupulousness) tended to define a narrow but 
distinct factor. Intellect/Openness items, in turn, tended to define two separate clus- 
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ters: one cluster related to academic performance, which was found to be associated 
with proactive Conscientiousness, and another cluster related to Openness to Experi- 
ence, which was found to be associated with Energy/Extraversion. These results 
mainly confirm earlier results obtained by Peabody and Goldberg (1989), who fur- 
ther differentiated "controlled" and "expressive" features of Intellec/Openness, and 
by Mervielde er al. (1995) who noticed the tendency of the "controlled" form of 
Intellect/Openness to cluster with Conscientiousness markers. 

The correlations between the Big Five and different criteria considered for the 
validation of the BFQ-C were significant and high. Intellec/Openness and Consci- 
entiousness were the more important personality correlates of Academic Achieve- 
ment across different informants, with correlations ranging from .39 to .66. These 
two traits were also the more important correlates of Externalization Behavioral 
Problems as measured by the Child Behavior Checklist (Achenbach & Edelbrock, 
1983; 1986), with correlations ranging from -.29 to -.41. Finally, Emotional Insta- 
bility turned out to be the more relevant correlate of Internalization as measured by 
the Child Behavior Checklist (Achenbach & Edelbrock, 1983; 1986), with correla- 
tions ranging from .28 to .42. These results replicate what has been found in other 
studies that used different instrument for assessing the Big Five in childhood (e.g., 
John, Caspi, Robins, Moffit, & Stouthamer-Loeber, 1994; Mervielde ег al., 1995). 
Finally, the BFQ-C factors showed high and clear-cut correlations with the three 
factors of the Eysenck taxonomy as measured by the Junior Personality Question- 
naire (JPQ; S.B.G. Eysenck, 1965). In particular, JPQ Extraversion was positively 
correlated with Energy/Extraversion, r (931) = .45, p « .001, with Intellect 
/Openness, r (931) = .24, p « .001, and with Friendliness, r ( 931) = .18, p « .001. 
ТРО Neuroticism was positively correlated with Emotional Instability, r (931) = .54, 
p < .001, and negatively correlated with Intellect/Openness, r (931) = -.20, p < .001, 
and with Energy/Extraversion, r (931) = .-18, p « .001. Finally, ЈРО Psychoticism 
was negatively correlated with Conscientiousness, r (931) = -.29, p < .001, Friendli- 
ness, г (931) = -.26, p « .001, and Intellec/Openness, r (931) = -.23, p < .001, and 
positively correlated with Emotional Instability, r (931) = .29, p « .001. This pattern 
of correlations was mostly consistent with those found on adult subjects (e.g., 
Caprara, Barbaranelli, Borgogni, & Perugini, 1993; McCrae & Costa, 1985). 

In light of these results, the BFQ-C may be considered to provide a comprehen- 
sive personality instrument to investigate systematically the origins and the devel- 
opment of relevant individual differences in children in relation to adjustment and 
maladjustment. As the BFQ-C can be used either as a self-report instrument, or as an 
instrument for collecting ratings from teachers and parents, it offers notable advan- 
tages in educational and counseling settings. The personality dimensions of the Big 
Five model are grounded into the lexicon people use in everyday life, and therefore 
the BFQ-C is probably easily understandable also by laypeople, fostering communi- 
cations among parents, teachers, and counselors. 
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Conclusion 


The studies reviewed here confirm the validity of the Big Five Questionnaire across 
different countries and languages as well as across different raters, response formats 
and applied settings. The Big Five dimensions are highly correlated with other in- 
ventories and appropriate external criteria. The BFQ-C developed for late childhood/ 
early adolescence furthermore attests to the possibility of extending this model for 
describing personality to earlier ages than the ones more usually considered. 

АП these findings also support the practical value of the Big Five Model. In fact, 
the convergence among self-report and other ratings shows that this model, capital- 
izing on the language that people use in everyday life, provides a lexicon (i.e., a 
common set of markers) which may enhance interjudge agreement and reduce inter- 
judge variability (see McCrae & Costa, 1987). In this regard, a better accuracy of 
personality descriptions and greater consensus can be achieved (see Funder, Kolar, 
& Blackman, 1995; Kenny, Albright, Malloy, & Kashy, 1994), as self-other agree- 
ment (i.e., how an individual's view of him/her self and the view another person has 
of that individual are in agreement), and other-other agreement (1.е., how two inde- 
pendent judges are in agreement in relation to a particular individual) can be maxi- 
mized. 

While the Big Five cover relevant domains of personality, they are specific 
enough to differentiate among specific aspects of the personality, providing the ele- 
ments for articulate fine-grained descriptions of personality. Indeed, they might pro- 
vide a compass to map individual differences into a common reference structure 
(Ozer & Reise, 1994). Results from our studies in which the BFQ was used for in- 
vestigating voters' and consumers' personality further extend what have been found 
in other fields such as organizational and industrial psychology (Barrick & Mount, 
1991; Tett, Jackson, & Rothstein, 1991), educational psychology (Graziano & Ward, 
1992; John et al., 1994), behavioral medicine (Dembrowski & Costa, 1987; Siegman 
et al., 1987), psychopathology (Widiger & Trull, 1992). Those studies confirm that 
the Big Five are not mathematical artifacts derived from non-generalizable data sets, 
but they have clearly behavioral counterparts in the real world. 

As we acknowledge that much debate on Big Five limitations and merits is still 
goirg on, we do not hesitate to state again that the strength of the Big Five Model is 
in its "practical" value: The Five Factors in fact might represent “а well- 
substantiated and agreed-upon framework for the structure of personality" (Briggs, 
1992, p. 254) to be used to provide a common language for research and assessment 
in personality psychology, and ultimately to generate new findings. 
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Chapter 6 


Assessing children's traits with the Hierarchical 
Personality Inventory for Children 


Ivan Mervielde 
Filip De Fruyt 


Introduction 


Despite clear evidence that the FFM is useful to describe personality differences in 
children (Digman, 1963; Digman & Inouye, 1986; John, Caspi, Robins, Moffit, & 
Stouthamer-Loeber, 1994; Kohnstamm, Halverson, Mervielde, & Havill, 1998; 
Mervielde, Buyst, & De Fruyt, 1995; Mervielde & De Fruyt, 2000; 2001), there are 
relatively few inventories that are specifically designed to assess children's person- 
ality. In line with the lexical tradition, the validity of the FFM for younger age 
groups has been mainly demonstrated in studies using personality descriptive adjec- 
tives, using parents, caregivers, or teachers as informants (De Fruyt & Furnham, 
2000; Kohnstamm er al., 1998; Mervielde & De Fruyt, 2001). However, it remains 
to be established whether trait adjectives form the most appropriate level to conduct 
developmental studies on individual differences. Apart from using adjectives in in- 
ventories for children, other major questions are whether the trait-sediment for chil- 
dren is simply a subset of the adult trait-sediment, whether there is something spe- 
cific for children that is never caught by adult trait-selections, and whether (some) 
"adult" adjectives have a specific "child" understanding as well. 

Behavior varies with age, and a personality inventory should ideally tap such de- 
velopments. The thinking about characteristics of people — young and old — 
probably takes place in terms of rather abstract words, such as trait adjectives. How- 
ever, dictionaries do not include references to the adequacy of adjectives to describe 
differences in particular groups, such as age or normal versus clinical groups. Sen- 
tence items might be more suitable to describe age-specific behavior, because they 
refer to less abstract qualities of behavior. 

Trait adjectives culled from dictionaries further reflect the passive rather than the 
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active personality descriptive vocabulary, and hence do not take into account the 
frequency of use of personality descriptors in everyday discourse. Although fre- 
quency of use is implicit in the lexical hypothesis, it has not been studied systemati- 
cally. Raters may spontaneously use and prefer a sentence item format, rather than 
trait adjectives, for self- and peer descriptions. De Raad (1985) found evidence of 
rather low frequency of use of adjectives in spontaneous talk. Finally, except for a 
limited number of studies on personality-descriptive verbs (De Raad, Mulder, 
Kloosterman, & Hofstee, 1988) and nouns (De Raad & Hoskens, 1990), the lexical 
tradition heavily relies on the study of trait adjectives. Sentence items could be an 
alternative to assess age and group-specific individual differences, eventually re- 
sulting in different factor solutions. 

Elphick, Slotboom, and Kohnstamm (1997) described three different strategies to 
assess the FFM in non-adult age groups. The first, most commonly applied approach 
uses an adult FFM measure for assessing children's or adolescents' traits. Parker 
(1997; 1998), for example, demonstrated that the NEO-FFI could be easily admin- 
istered to gifted adolescents. More recently, De Fruyt, Mervielde, Hoekstra, and 
Rolland (2000) showed that the more comprehensive NEO-PI-R is also structurally 
invariant in more heterogeneous samples of adolescents. Alternatively, item phras- 
ing and/or rating instructions of adult personality inventories are sometimes slightly 
adapted to make them more suitable for childhood or adolescent personality assess- 
ment, such as for the Junior Eysenck Personality Inventory (Eysenck, 1965). 

A second strategy derives FFM scores from childhood or adolescent inventories 
that are constructed to primarily operationalize another personality model than the 
FFM. Items and scales are rearranged in order to form reliable markers for the FFM 
dimensions. John et al. (1994) and Van Lieshout and Haselager (1994) derived five- 
factor scores from a re-analysis of Block's California Child Q-set (CCQ; Block & 
Block, 1980). Similarly, Judge, Higgins, Thoresen, and Barrick (1999) studied trait 
rank-order continuity across the life-span derived from an FFM rescaling of Q-sort 
data. The major drawback of this method is that these five-factor measures largely 
depend on the theoretical framework of the original instrument, and hence are at best 
proxies of FFM dimensions. 

Finally, another approach is a bottom-up strategy, directed to the construction of 
a new and specific FFM inventory assessing children's or adolescents’ traits. This 
third strategy first necessitates a careful analysis of the full range of personality dif- 
ferences that can be reliably observed in the target age group(s). The rationale be- 
hind this approach is that the kind and number of traits assessed should closely mir- 
ror the observable personality differences among individuals of the target age group. 
The Hierarchical Personality Inventory for Children (HiPIC; Mervielde & De Fruyt, 
1999) was constructed along such a bottom up approach. The present chapter des- 
cribes its rationale, construction, and application for the assessment of personality 
traits of children aged 6 to 12 years. 
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Development 


Describing the childhood personality domain 


The challenge to describe the range of childhood personality traits was taken up by 
an international research team, examining the content and structure of parental free 
descriptors of children aged 3 to 12 years (Kohnstamm et al., 1998). This interna- 
tional consortium, with teams from Belgium, China, Germany, Greece, USA, Po- 
land, and The Netherlands, included developmental and personality psychologists 
interested to examine the developmental antecedents of the adult Big Five. 

In order to sample the range of personality differences observable in children 
across different cultures, all research teams applied the same procedure, interview- 
ing parents and asking them to describe what they thought was characteristic for 
their child. Third year psychology students served as interviewers and were in- 
structed to give only neutral prompts to elicit further description, without additional 
constraints. Interviews were tape-recorded and transcribed verbatim afterwards. The 
Flemish team, for example, collected descriptors from 427 parents this way, de- 
scribing children of 3 to 13 years. However, in the subsequent phases, only the de- 
scriptors of children aged 5 to 13 years were used, because the primary objective 
was to focus on primary school age. 

The transcribed interviews were further segmented into small personality des- 
criptive utterances, which were subsequently assigned to a personality descriptive 
category system. This personality descriptive lexicon explicitly referred to the five 
factors, with eight additional categories derived from the temperament and devel- 
opmental literature, tentatively labeled as Independence (VI), Mature for age (УП), 
Illness, handicaps, and health (VIII), Rhythmicity (ІХ), Gender appropriate behavior 
(X), Physical attractiveness (XI). Cuddliness and clinging behavior (ХП), Relation- 
ships with siblings and parents (XIII), and finally a rest category for descriptors that 
could not be classified (XIV). The first five main categories were further structured 
in subcategories. The category system partly followed the FFM, but did not preclude 
that other dimensions outside the FFM would emerge (Kohnstamm ct al., 1995). 
The 427 interviews with Flemish parents provided a total pool of 9,607 descriptors 
(for details, see Kohnstamm ег al., 1998). Between 70 and 80 per cent of all parental 
free descriptors could be classified as instances of the Big Five in all cultures 
(Kohnstamm er al., 1998). The distribution of Flemish descriptors across the 14 
categories of the lexicon is described in Table 1. Mervielde (1998) further demon- 
strated that 68 per cent of all parents referred to at least four of the Big Five catego- 
ries, whereas less than 10 per cent referred to only two or to just one category. 
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Table 1. Percentage of classified descriptors across coding categories 


Category-label % 
1  Extraversion 27.0 
2 Agreeableness 19.4 
3 Conscientiousness 8.4 
4 Emotional stability 9.4 
5 Openness to experience/Intellect 12.9 
6 X Independence, Ability to do things independently 3.5 
7 Mature for age 2.4 
8 Illness, handicaps, and health 0.6 
9  Rhythmicity of eating, sleeping, etc. 0.9 
10 Gender appropriate, physical attractiveness 0.9 
11 School performance, attitudes toward school 4.0 
12 Contact comfort, desire to be cuddled, clinging 157 
13 Relations with siblings and parents 4.3 
14  Ambiguous phrases and descriptions that cannot be coded elsewhere 4.5 


However, the classification results only tentatively underscore the comprehensive- 
ness and saliency of the FFM categories, because they just reflect a classification 
process, rather than providing insight into the dimensions underlying individual dif- 
ferences assembled with a free description procedure. 


From free descriptors to items 


To further structure the categorised Flemish descriptors, homogeneous groups of 
similar and content related descriptions were formed within each category by teams 
of five judges. Judges did not receive a priori guidelines about the optimal number 
or the breadth of these groups. This classification procedure was only applied to 
main categories or subcategories with more than one per cent of the total number of 
descriptors. The production of content related groups and the subsequent assignment 
of descriptors was carefully checked and supervised by two research assistants. 
About 100 groups of ‘synonym’ -free descriptions were identified for the age groups 
5 to 7, 8 to 10 and 11 to 13', roughly comparable across age-groups. For example, а 
Conscientiousness' subcategory 'Carefulness — negative’, was further split into four 
groups, i.e. ‘orderly-neat’, ‘precise’, 'good attention span-attentive', and ‘responsi- 
ble-reliable’, including 18, 23, 16, and 17 free descriptions, respectively, for children 
aged 8 to 10 years. A detailed overview of the Flemish free descriptor groups can be 
found elsewhere (Mervielde & De Fruyt, 1999). 

Although the sampled free descriptions cover a wide range of behaviours char- 
acteristic of school-age children, they cannot be used directly as questionnaire items 
because their grammatical structure is very divergent, hampering a uniform inter- 
pretation of their meaning. Moreover, many of the descriptors are trait adjectives or 
adjectives form the descriptive core of the sentences. The primary objective of the 


| we primarily focused on primary school age. Clustering of descriptors for the age group 2 to 4 
years will be accomplished later on. 
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Flemish project was to construct an inventory with sentence items, referring to con- 
crete observable behaviour, rather than compiling a list with trait adjectives. Rules 
developed for the construction of an adult five-factor inventory, the FFPI (Five- 
Factor Personality Inventory; Hendriks, 1997), were adopted for the production of 
behavioural items, in order to streamline the format of all items. Each of the items 
was formulated in the third person verb form (e.g., "keeps emotions and thoughts for 
oneself", "has a limited vocabulary"), did not contain a trait adjective and was for- 
mulated in a direct form avoiding negations. 

We started with the production of items for the 100 descriptor-groups that were 
identified for the children aged 8 to 10 years. For each of the homogeneous groups 
of free descriptors of this age-level, two to four items were written based on the 
content of that group. Items for the age-levels 5 to 7 and 11 to 13 were produced 
according to a similar procedure, with the objective of maximising the number of 
common items for the three age-levels, based on a comparison of the content of the 
homogeneous groups for different age-levels. This strategy resulted in initial item 
sets with 240 items for age 5 to 7, 282 items for age 8 to 10, and 234 items for age 
11 to 13. One hundred and twenty-two items were common to the three age-groups. 


Delineating the domain structure 


The resulting item sets were given to adult raters (one or both parents + teachers) 
who had to provide ratings of children's behavior on a five-point scale, anchored as 
follows: (1) Almost not characteristic, (2) Little characteristic, (3) More or less char- 
acteristic, (4) Characteristic, and (5) Very characteristic. Scores given by the two or 
three adult raters (parents/teachers) were averaged, in order to increase the reliability of 
the rating and to limit the influence of any particular rater perspective. Preliminary prin- 
cipal component analyses at the item level indicated that for each age-level, the first five 
principal components tended to group items according to each of the FFM categories 
that were used to sort the free descriptors. The correspondence of the structure of the 
initial pool of behavioural items with the "lexical Big Five" was further checked for 
age groups 5 to 7 and 8 to 10. For both age-levels additional ratings of each child 
were available on the BSBBS-25, an instrument with 25 bipolar scales marking the 
five factors (BSBBS-25; Mervielde, 1992), selected from Goldberg's Big Five ad- 
jective markers (Goldberg, 1989). In addition, ratings on trait adjectives culled from 
the free descriptions were available. Observers provided ratings on 143 trait adjec- 
tives for age-level 5 to 7 years and on 152 adjectives for age-level 8 to 10 years. A 
joint principal component analysis of the components extracted from the behavioural 
items, with ratings of the same children on the BSBBS-25 (Mervielde, 1992) scales 
and five components extracted from ratings on trait adjectives confirmed the corre- 
spondence between the principal components extracted from the behavioural item 
sets and the Big Five as conceived in the lexical approach. Therefore it was decided 
to systematically extract and rotate five components to construct age-specific ques- 
tionnaires, assessing broad and more specific traits. 
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Table 2. Phases for the selection of items and the composition of clusters and facets 


Construction 
Step `- Phase Method of item selection 
1 Scales 1 Group items into scales based on the free description cluster that was 
used to write them (100 clusters 100 scales) 
2 Scales 2 Compute alpha reliability for each scale: 
- put items that lower alpha into single-item scale 
x - split scales with alpha « .60 into single-item scales 
3 Scales 3 Principal component analysis of all scales: 
extract and varimax rotate 5 components 
- drop scales with communalities « .30 
4 Scales 4 Reassign single-item scales to ти —Кет scales based on correlation 
analysis 
5 Scales 5 Principal component analysis of all scales: 
- extract and varimax rotate 5 components 
- assign scales to the highest loading component 
6 Facets 1 Principal component analysis of items within each of the 5 components: 
extract and rotate (Oblimin) 2 to 7 components 
assign items to facets based on the PC-analyses 
7 Facets 2 Compute alpha reliability for each facet: 
- remove items that lower alpha 
-  re-assign some items based on correlational analysis 
8 Facets 3 Principal component analysis of all facets: 
- extract and varimax rotate 5 components 
- remove facets with low communalities 
9 Facets 4 Select 8 items for each facet based on: 
-  item-total correlation 
contribution to simple structure 
10 Facets 5 Compute alpha reliability for each facet 
11 Facets 6 Principal component analysis of facets 


For each age-level, items were first grouped into scales based on the free descrip- 
tion group that was used to produce them. Cronbach alpha coefficients for each scale 
were computed, and items that lowered the alpha's were re-assigned to single-item 
scales. In addition, multi-item scales with alpha's < .60 were also split into single- 
item scales. Multi- and single-item scales were then submitted to a new principal 
component analysis per age-level, followed by varimax rotation of five components. 

Scales with communalities « .30 were dropped. Single-item scales from the pre- 
vious phase were reassigned to the highest correlating multi-item-scale. A final prin- 
cipal component analysis of all scales per age-level was conducted, retaining five 
varimax-rotated factors. The items primarily loading on each component defined the 
item domain of analysis for constructing facets within these particular components. 
An outline of the different steps to define the content domains is presented in the top 
section of Table 2. 
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Constructing facets within domains 


To infer a facet structure within each of the five components, all items primarily 
loading on a particular component were submitted to principal component analysis, 
followed by oblimin rotation of two to seven components. A similar procedure was 
followed to investigate the domain structure of each age group. The number of com- 
ponents to be retained within a domain was determined by a combination of criteria, 
including inspection of the scree plot, the total amount of explained variance, com- 
munalities of the items, and the number of items loading on within-domain factor. 
Oblimin rotation was preferred, given the substantial variance shared by items, pri- 
marily loading on the same component in the domain analysis. 

Items were subsequently assigned to facets based on the results of the principal 
component analyses. Alpha reliabilities for the resulting facets were computed, and 
items were removed that lowered the alpha's and re-assigned to new facets based on 
their intercorrelation pattern. A new principal component analysis of the facets was 
then conducted to remove facets with low communalities. For each facet, 8 items 
were selected based on their item-total correlation and on their contribution to sim- 
ple structure. The alpha reliabilities of the final selection of 8 items per facet were 
computed, followed by principal component analysis of the facet scales. The differ- 
ent steps in the facet construction process are described in the bottom section of Ta- 
ble2. 

The final number of items for each of the facets was restricted to 8 in order to 
avoid too lengthy questionnaires. Inspection of the reliabilities and of average inter- 
item correlations shows that the facets are homogeneous with alpha reliabilities 
ranging from .85 to .94 for the youngest age group, from .86 to .95 for the middle 
group, and from .85 to 93 for the oldest children. Given the rather high average in- 
ter-item correlations, one can be fairly sure that the contents of the items belonging 
to a given facet are highly similar. Constructing facets always entails a difficult 
choice between bandwidth and fidelity. Bandwidth was emphasized initially, be- 
cause broad sets of categories and clusters were used to organise parental free de- 
scriptions and to write items for the initial item pools per age group. Homogeneity 
of content within a facet was emphasized in the aggregation process of descriptor- 
groups and of items into facets to come up with reliable facet scales with a stable 
and replicable position within the personality descriptive model. By shifting the em- 
phasis from bandwidth to fidelity the content validity may be reduced somewhat, but 
this is the price to be paid for a stable facet structure. 

The | 1-step construction procedure outlined in Table 2 led to three questionnaires 
with a rather similar structure, covering behavioural differences observable in chil- 
dren 5 to 7, 8 to 10, and 11 to 13. Nineteen different facets were delineated from the 
parental free descriptions across the three age groups, with 17 facets being similar 
across ages. Two facets could not be recovered in each age group, namely Altruism and 
Independence. The Altruism facet was only prominent in the age groups 5 to 7 and 11 to 
13, while Independence was only represented in the descriptors of children aged 8 to 10. 
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The common 17 facets demonstrated to have stable primary loading patterns in the FFM 
framework across ages, although some of the domain labels had to be adapted to ac- 
commodate a broader spectrum of traits observable in children. 

The coniposition of the first childhood domain factor, Conscientiousness, was the 
same across the three age groups, including the facets Achievement Striving, Otder, 
Perseverance and Concentration. Three of these facets are closely related to facets 
defined in the NEO-PI-R (Costa & McCrae, 1992). Order and Achievement Striving 
have an identical label, while Perseverance is closely related to Self-discipline, de- 
scribed as “refers to the ability to begin tasks and carry them through to completion 
despite boredom and other distractions" (Costa & McCrae, 1992, p. 18). Finally, 
Concentration as such is not included in the adult NEO-PI-R model but it may be 
related to Dutifulness and to Deliberation. The greater emphasis on Concentration 
may further reflect a major concern of parents judging children. 

The second childhood domain factor, related to Agreeableness, contains a broad 
spectrum of facets — including Altruism, Dominance, Egocentrism, Compliance, 
and Irritability — belonging to different factors in adult FFM operationalizations. 
Facets are more evaluatively negative in nature here, referring to characteristics of 
the "easy versus difficult child" as conceived in the temperament literature. To dis- 
tinguish this broader content from the adult Agreeableness factor, the factor was 
labelled as Benevolence, including typical adult Agreeableness facets such as Com- 
pliance, Egocentrism, and Altruism, but also facets that are primarily related to other 
Big Five factors in adults such as Dominance and Irritability. Dominance primarily 
loads on the Extraversion domain in adults, whereas [Irritability is more related to 
Neuroticism. Possible reasons for this broader domain content are that the Benevol- 
ence factor refers to a sort of Externalizing factor, also represented in the Child Be- 
havior Checklist (CBCL; Achenbach, 1991; see also later in this chapter for the em- 
pirical relationship). In addition, the broader spectrum may also result from using 
parents as primary informants for both defining and rating the personality domain. 
The traits captured by the Benevolence factor are conceptually related to "manage- 
ability of the child" and may hence reflect a dimension with a strong socially 
evaluative connotation for caregivers. 

The third childhood domain factor, Extraversion, is composed of the same four fac- 
ets at each age level, namely Shyness, Expressiveness, Optimism, and Energy, all 
having clear counterparts in the adult personality literature. However, the facets' 
loadings on the domain vary across age. Shyness is the highest loading facet for age 
5 to 7, while Energy is the highest loading facet for the oldest group. Optimism is 
apparently linked to the Positive emotions facet of Extraversion as measured by the 
NEO-PI-R, while Expressiveness refers to the open expression of emotions and to 
talkativeness. 

The fourth childhood domain factor is called Imagination. In the Big Five litera- 
ture the related factor has been variously interpreted as Intellect, Culture, or Open- 
ness to Experience. However, none of these labels seem to adequately represent the 
common core among the Creativity, Curiosity, and Intellect facets that emerged in 
the present research for cach of the three age groups. In agreement with a proposal 
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by Saucier (1992) the label Imagination was chosen, referring to both Creativity and 
Intellect. 

Finally, the smallest factor in terms of the number of facets is Emotional Stabil- 
ity, including Anxiety as a factor-pure marker, and Self-confidence with a secondary 
loading on Extraversion. Independence was only recovered from the item set for age 
group 8 tot 10, and it also covaried with three other domains. Parallel to findings of 
the lexical approach (Goldberg. 1993), it seems that the Emotional Stability factor is 
also more restricted in content than the four others when analyzing free descriptions. 


А common inventory for 6 to 12 


Although the objective was the construction of three different age-specific question- 
naires, the large number of similar facets across age groups is indicative for the high 
degree of overlap among the three questionnaires. More than half the items were 
common to the three instruments and even more were common to.two of the three. 
This correspondence is of course the result of our policy to start from a highly com- 
mon set of categories and descriptor-groups to cover the content of free parental 
descriptions. In addition, items that were already part of another age-specific ques- 
tionnaire were preferentially selected to assess similar traits in another age group in 
order to further strengthen the common core. 

Focussing on the common core rather than on the diversity across age groups has 
several advantages. From a practical point of view, there is a limit to the degree of 
differentiation. The validation of factors and facets for age-specific questionnaires 
requires many studies, and an equally differentiated set of age-specific criterion 
measures may not be available. Moreover, for research purposes it is often difficult 
to collect data from sufficient numbers of subjects within a specific age-group. For 
individual diagnosis, narrow age-ranges require separate norms for each age group. 
Finally, comparison across age levels is only feasible for the common set of items 
that may turn out to have yet a divergent structure. That common structure may be 
biased by an unequal distribution of items and/or facets from one particular item-set. 
Because quantitative comparison of mean levels and structure across age groups has 
to be based on a set of common items, emphasising a broad common core of items 
seems to be the most adequate strategy for research purposes as well as for diagnos- 
tic applications. 

Mervielde and De Fruyt (1999) computed Tucker congruence coefficients be- 
tween the factor matrices of the three age groups, over the set of 17 facets common 
to the three instruments, in order to empirically examine the relationships between 
the three age-specific questionnaires. The congruencies between age group 5 to 7 
and 8 to 10 varied from .95 to .98. For the age groups 8 to 10 and 11 to 13 coeffi- 
cients ranged from .94 to .99. The congruence between the two most distant groups 
(5 to 7 and 11 to 13) ranged from .90 to .97. Although by dropping the non-common 
facets, Altruism and Independence, the congruence coefficients overestimate the true 
degree of correspondence. The fact that all observed congruencies are higher than 
.90, nevertheless confirms that there is a high degree of similarity among the three 
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age-specific questionnaires. For this reason, but also for the development of an in- 
strument applicable to a reasonably broad age range, it was decided to merge the 
three questionnaires into one instrument, targeted at the primary school age (6 to 12 
years). 

The final inventory included the 17 common facets, together with Altruism that 
was common to age groups 8 to 10 and 11 to 13. Independence as a facet was 
dropped because it only emerged from the analyses of age group 8 to 10. To decide 
which items were to be included for each of the 18 facets, the following rules were 
adopted. Items loading on a given facet in each of the three age-specific question- 
naires were retained for the final version. Then, items that were common to two of 
the three age-groups were added. If at that stage the total number of 8 items was not 
reached, items were added that were specific to the youngest age group. By pooling 
all averaged ratings across samples, it was possible to compute new facet scores for 
each of the 18 facets based on 6 to 8 items. A principal component analysis of the 
facet scales, followed by Varimax rotation, clearly confirmed the presumed structure 
of the integrated HiPIC scales, suggesting that 80.7 per cent of the variance was ex- 
plained by the first five rotated components. 


Table 3. HiPIC domains, facets and sample items 


Domains/facets Sampie item 
Conscientiousness 
Achievement motivation wants to shine at everything 
Concentration works with sustained attention 
Perseverance perseveres until the goal is achieved 
Orderliness leaves everything lying around (RK) 
Benevolence 
Egocentrism finds it hard to share with others 
Irritability is quick to take offence 
Compliance obeys without protest 
Dominance acts the boss 
Altruism defends the weak 
Extraversion 
Shyness tries to establish contact with new class-fellows (RK) 
Optimism sees the sunny side of things 
Expressiveness keeps feelings and thoughts to him/herself (RK) 
Energy has an excess of energy 
Emotional Stability 
Anxiety is quick to worry about things 
Self-confidence takes decisions easily (RK) 
Imagination 
Creativity derives pleasure from creating things 
Curiosity likes to learn new things 
Intellect is quick to understand things 


Note: ВК = reversed keyed item 
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Instrument characteristics 


The final Hierarchical Personality Inventory for Children (HiPIC; Mervielde & De 
Fruyt, 1999) includes 144 items, 8 items per facet, assessing 18 facets hierarchically 
structured under the domains. The production process of the inventory was directed 
at an adequate and psychometric sound representation of the content enclosed in 
parental free descriptors, empirically aggregated to more reliable facets and do- 
mains. An overview of the hierarchical structure and a sample item per facet are 
presented in Table 3. 


Structural replicability 


The common HiPIC version was given to the parents of a new sample of 719 twins 
and their siblings aged between 5 and 13 (De Fruyt & Mervielde, 1998). Both par- 
ents provided independent ratings of the children. Ratings were averaged across par- 
ents, and the facet scales were subsequently submitted to principal component 
analysis. The factor loading matrices for the total sample, and for boys and girls 
separately, are presented in Table 4. Inspection of these matrices shows that the 
HiPIC structure is highly replicable in an independent new sample, with all facet 
scales loading on the expected components. Moreover, the factor structure proves 
invariant for boys and girls, with only minor deviations in primary and secondary 
loading patterns. The alpha reliabilities parallel the coefficients found in the con- 
struction samples, all ranging between .81 (Self-confidence) to .92 (Orderliness), 
with mean inter-item correlations varying between .36 (Self-confidence) and .58 
(Orderliness). These psychometric findings have been recently confirmed in studies 
with both clinical and non-clinical subjects (Mus, 2000; Van Leeuwen, 2000; Van 
Hoecke, 2000; Vanoutrive, 2001) that included the HiPIC as part of their assessment 
battery. In all studies, the assumed factor structure was clearly replicated, under- 
scoring adult findings that the FFM is also useful to conceive individual differences 
in clinical samples (Costa & Widiger, 1994). 


Convergent and discriminant validities across observers 


De Fruyt and Vollrath (submitted) recently examined the convergent and discrimi- 
nant validities across parents in a combined Flemish (№ = 104) and Swiss (№ = 205) 
sample. A comparison of the maternal and paternal correlational pattern in the total 
Flemish and Swiss sample enables an analysis of the convergent and discriminant 
validity of the HiPIC domain and facet scales across observers. At the domain level, 
the median convergent correlation of paternal and maternal factor scores was .74, 
with a discriminant median validity across observers and domains of .02. The 
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median convergent validity coefficients for the facet scales of each HiPIC domain 
were .70, .65, .77, .65, and .74 for Extraversion, Benevolence, Conscientiousness, 
Stability, and Imagination, respectively. The semi-convergent validity across ob- 
servers for a domain is indicated by the absolute median validity coefficients across 
observers for the facets in a domain, without the correlations on the diagonal. For 
example, paternal Creativity (I1) ratings were correlated with maternal Intellect (12) 
and Curiosity (13) ratings; paternal Intellect (I2) ratings with maternal Creativity (11) 
and Curiosity ratings (I3); and finally, paternal Curiosity (13) with maternal Creativ- 
ity (I1) and Intellect (12) ratings. The absolute validity coefficients should be taken, 
because for three of the five HiPIC domains, facet-labels refer to opposite poles of 
the same domain. Provided the unequal number of facets per domain, there are 12 
such correlations for Extraversion, and 20, 12, 2, and 6 for Benevolence, Conscien- 
tiousness, Stability, and Imagination, respectively. The absolute median semi- 
convergent validities were .34, .33, .55, .43, and .40, respectively, and were about 
.20 to .30 lower than the convergent coefficients. Finally, the discriminant validities 
for the facets in a domain are indicated by the absolute median correlation between 
the facets of a domain and all facets of the other HiPIC domains. For example, pa- 
ternal E1, E2. E3, and E4 ratings were correlated with all maternal B-, C-, S- and I- 
facet ratings and vice versa. There are 112 such discriminant correlations for the E- 
domain and 130, 112, 64, and 90 for B, C, S, and I, respectively. The absolute me- 
dian discriminant validities for these domains were .12, .12, .14, .13, and .15. 

In summary, the convergent validities were substantially higher than the semi- 
convergent validities, as they should be, and the absolute median discriminant 
validities were in-between .12 and .15. Ideally, they should be close to zero, but rater 
biases and social evaluative meaning contribute to the intercorrelation pattern. All in 
all, these analyses underscore the convergent, semi-convergent, and discriminant 
validities of the facets. 


Temporal stability 


De Fruyt and Mervielde (2000) investigated the stability of the HiPIC traits across a 
three-year interval in a longitudinal twin-family study, including parental ratings of 
twins and siblings. Stability coefficients for the domains, not corrected for unreli- 
ability, ranged from .59 (Emotional Stability) to .76 (Imagination). The stability co- 
efficients for the Conscientiousness, Extraversion, Benevolence, and Imagination 
facets were roughly comparable across domains, with those for Emotional Stability 
being about .10 lower. The domains had slightly higher stability coefficients than the 
facets. Stability coefficients of domains and facets were further comparable across 
fathers and mothers. 
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Construct validity 


Although the HiPIC was primarily conceived as an observer inventory, it is also 
useful as a self-rating instrument for adolescents aged between 12 and 15. De Fruyt 
et al. (2000) examined the interrelationships between self-reported HiPIC and NEO- 
PI-R adolescent ratings, demonstrating absolute correlation coefficients between .70 
and .74 between four of the corresponding domain scales, and a smaller correlation 
between the Openness to Experience and Imagination domains (r = .45). Moreover, 
facets with similar content or conceptually related labels in the two inventories cor- 
related in the .60 to .70 range, underscoring the construct validity of the HiPIC fac- 
ets. Tucker congruence coefficients for self- and parental ratings on the HiPIC dem- 
onstrated to be on average .95, ranging between .87 and .98. 

Van Leeuwen (2000) investigated the relationships with the Child Behavior 
Checklist (CBCL; Achenbach, 1991; Verhulst, Van der Ende, & Koot, 1996) in a 
population study on parenting behavior, demonstrating that (un)Benevolence corre- 
lates .53 and Conscientiousness -.32 with Externalizing behavior, while (un)Stability 
correlated .49 and Extraversion -.21 with Internalizing scores. The correlations with 
the remaining domains were all below |.20|. These findings underscore the external 
validity of the HiPIC domains, suggesting that the CBCL and HiPIC include com- 
mon variance. 

Finally, the HiPIC scales proved also useful for person-centered descriptive 
analyses. De Fruyt and Mervielde (2000) clustered HiPIC raw domain scores’, first 
according to Ward's method followed by a hierarchical K-Means cluster analysis. 
The resulting clusters clearly corresponded to prototypes previously described in 
person-centered analyses of other FFM measures (Asendorpf & Van Aken, 1999). 
Comparable to Asendorpf and Van Aken (1999), the resilients had an overall well- 
adjusted profile, whereas the undercontrollers had on average substantially lower 
Benevolence and Conscientiousness scores. The overcontrollers had on average high 
Neuroticism, but lower Extraversion scores. 


Conclusions 


In sum, the HiPIC can be considered as a most comprehensive personality inventory 
today assessing individual differences in children. The instrument is broadly applic- 
able in both research and professional practice when a detailed assessment of indi- 
vidual differences of children is requested. Its 144 short and grammatically similarly 
phrased items, are grouped into 18 facets, hierarchically structured under five broad 
domain factors. The HiPIC can be used as an observer inventory by parents and 


? We usually compute factorscores for the HiPIC domains, except otherwise specified. Raw domain 
scores are sumscores derived from aggregation of facets (after reversing the order when facets are 
oppositionally keyed). 
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teachers, but can also be used for self-reports by adolescents aged between 12 and 
15. It takes about 15 to 20 minutes to fill out, and instructions are kept simple and 
are kept to a minimum, asking informants to examine whether each item can be con- 
sidered as characteristic for the child on a five-point scale. The scoring and the com- 
putation of facet and domain scores are preferably done automatically to minimize 
keying and computation errors. For a subset of items, raw scores have to be reversed 
to key them in line with the facet label. After reversing, facet scores can be easily 
computed through aggregation of the 8 item scores. Usually factor scores are com- 
puted as indicators of the five domains, starting from a principal component analysis 
of the 18 facets, and are hence independent from each other. Detailed instructions on 
administration, scoring, and interpretation, and different norm sets, including norms 
for population and clinical samples, are available from the authors. Preliminary 
English and German translations of the instrument are currently used in different 
research projects. The HiPIC will be commercially available by the beginning of 
2002, distributed by a European Test Publisher. Colleagues interested to use the in- 
strument for research or other purposes in the meantime are requested to contact one 
of the authors. 

The availability of an inventory assessing primary and secondary order traits in 
primary school children fills a gap in current personality development research and 
assessment practice. Furthermore, the rationale behind the construction of the HiPIC 
extends the lexical approach of personality description, investigating the structure of 
the active and age-specific personality descriptive vocabulary. The analysis of the 
parental free descriptions strikingly parallels Shiner's (1998) literature search for the 
most important constructs to describe individual differences in children. Both the 
empirical and the conceptual analysis of children's individual differences do not 
reveal important dimensions outside the five-factor framework and thus extend and 
confirm the results from the lexical approach starting from trait adjectives. Although 
the labels for two of the five HiPIC domains are different from the adult Big Five, 
the previous analyses have demonstrated that they can be conceptually and empiri- 
cally related to the adult factors. Longitudinal research will ultimately demonstrate 
whether these dimensions and facets can be considered as developmental precursors 
of the adult dimensions and their lower level traits. 

Finally, we hope that the development of the HiPIC may further contribute to an 
increased attention for childhood individual differences in assessment practice. The 
wealth of instruments assessing adult traits compared to the relatively few instru- 
ments applicable for childhood personality description enclosed in this volume, il- 
lustrates the need for a specific, robust and comprehensive inventory primarily de- 
signed for childhood personality assessment. We are therefore looking forward to 
suggestions and experiences of both researchers and practitioners with the applica- 
tion of the HiPIC. 
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Chapter 7 


The Structured Interview for the Five Factor 
Model of Personality (SIFFM) 


Timothy J. Trull 
Thomas A. Widiger 


Introduction 


The Structured Interview for the Five Factor Model of Personality (SIFFM; Trull & 
Widiger, 1997) is a semi-structured interview to assesses adaptive and maladaptive 
variants of traits relevant to the Five-Factor Model of personality (FFM). As readers 
of this volume know, the FFM consists of the following bipolar trait dimensions: (1) 
Neuroticism (vs. emotional stability); (2) Extraversion (vs. introversion); (3) Open- 
ness to Experience (vs. closedness to experience); (4) Agreeableness (vs. antago- 
nism); and (5) Conscientiousness (vs. negligence) (McCrae & Costa, 1990). Each of 
these broad domains can also be differentiated into underlying facets (i.e., primary 
or first-order personality traits). 

There are a number of reasons why we developed a semi-structured interview for 
the assessment of the FFM. Much of our research and clinical work has focused on 
personality disorders (e.g., Trull, 1995; Trull & Widiger, 1991; Trull, Widiger, & 
Guthrie, 1990; Widiger, Mangine, Corbitt, Ellis, & Thomas, 1995; Widiger & 
Sanderson, 1997). However, empirical research, and our own clinical experiences, 
has led us to prefer the FFM as a model of personality disorder, as well as of normal 
personality functioning (Trull, 1992; 2000; Widiger, 2000; Widiger & Trull, 1992). 
Why this preference? We believe that the FFM offers many advantages over existing 
models of personality disorder. 

No doubt, the diagnosis of DSM-IV personality disorders is of substantial clinical 
and social importance (Widiger & Sanderson, 1997). Disorders of personality func- 
tioning have been recognized since the beginning of medicine and within each edi- 
tion of the major diagnostic manuals of mental disorders because many patients do 
appear to present with problems that are best understood as resulting from long- 
standing maladaptive personality traits (Livesley, 2001; Millon ег al., 1996). For 
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example, persons who have met the various diagnostic criterion sets for Antisocial 
Personality Disorder have been shown to be at significant risk for unemployment, 
impoverishment, injury, violent death, substance and alcohol abuse, incarceration, 
recidivism ‘(parole violation), and significant relationship instability (Robins, Tipp, 
& Przybeck, 1991; Stoff, Breiling, & Maser, 1997). Dependent personality traits 
have been shown to be associated with excessive and maladaptive efforts to main- 
tain relationships and with a vulnerability to episodes of depression in response to 
interpersonal loss (Blatt & Zuroff, 1992; Bornstein, 1992; Santor & Zuroff, 1997). 
Narcissistic personality traits have been associated with the occurrence of antago- 
nistic, aggressive, and even violent reactions to threats and injuries to self-esteem 
(Bushman & Baumeister, 1998; Rhodewalt, Madrian, & Cheney, 1998). Borderline 
Personality Disorder has been associated with a wide variety of maladaptive out- 
comes (e.g., death by suicide, relationship instability, personal distress, and eating, 
mood, substance, and dissociative disorders) that often have substantial public health 
costs (Gunderson, 1984; Linehan & Heard, 1999). Personality disorders also influ- 
ence significantly the occurrence, expression, course, and/or treatment of most other 
mental disorders (Shea, Widiger, & Klein, 1992) as well as themselves being the 
focus of therapeutic interventions (Perry, Banon, & Ianni, 1999; Sanislow & 
McGlashan, 1998). 

However, personality disorders are among the most problematic to diagnose (Ma- 
ser, Kaelber, & Weise, 1991; Perry, 1992; Widiger & Coker, in press; Zimmerman, 
1994). For example, because of the polythetic nature of the diagnostic criteria for the 
DSM-IV personality disorders, there is great heterogeneity among persons who re- 
ceive the same personality disorder diagnosis (Trull, 2000; Widiger, 1993). Thus, 
two individuals diagnosed with Borderline Personality Disorder may present with 
different combinations of symptoms and different clinical presentations. Second, the 
DSM-IV makes an arbitrary distinction between personality disorders and “normal” 
personality functioning (Widiger & Corbitt. 1994). No rationale or empirical support 
has ever been provided for the most of the personality disorder diagnostic thresh- 
olds, and these arbitrary thresholds result in a significant loss of information. Fi- 
nally, the comorbidity or co-occurrence among personality disorder diagnoses is 
substantial (Oldham er al., 1992; Trull, 2000: Widiger & Rogers. 1989) and incon- 
sistent with "the categorical perspective that Personality Disorders represent qualita- 
tively distinct clinical syndromes" (APA, 1994, p. 633). It is not clear how much 
more clinically useful information is garnered by listing three or four diagnoses (the 
average number of personality disorder diagnoses received by individual patients) 
that are themselves overlapping and still inadequate in characterizing all of the im- 
portant adaptive and maladaptive personality traits that are present within each indi- 
vidual patient. The inclusion of new personality disorder diagnoses within the APA 
diagnostic manual is unlikely to occur (Pincus, Frances, Davis, First, & Widiger, 
1992), yet many also feel that the DSM-IV does not provide enough coverage of 
maladaptive personality traits that will often be the focus of clinical treatment 
(Westen & Arkowitz-Westen, 1998; Widiger, 1993). 
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An FFM alternative: The SIFFM 


A dimensional, quantitative assessment of personality traits can address many of 
these limitations of the DSM-IV personality disorder diagnoses (Costa & Widiger, 
in press). This is the approach of the SIFFM. The traits that are assessed by the 
SIFFM are those identified within the dimensional Five-Factor Model (FFM) or Big 
Five Model (BFM) of personality, as described by Costa and McCrae (1992), 
Digman (1990), Goldberg (1992), Tellegen and Waller (in press), and others. There 
are alternative dimensional models of personality (e.g., Benjamin, 1993; Clark, 
1993; Cloninger, Svrakic, & Przybeck, 1993; Millon & Davis, 1994), but we believe 
that the Big Five Model (BFM) and, more specifically, the FFM have certain fea- 
tures that result in significant advantages over these alternatives (Costa & Widiger, 
1994; in press; Trull, 2000; Wiggins, 1996). 

First, compared to the DSM-IV, which is theoretically diverse, the BFM/FFM is 
more theoretically neutral. These models do not represent the personal or theoretical 
views of any theorist. Rather, their derivation was based on the idea that the most 
important personality traits could be identified by sampling trait terms that appeared 
most frequently within the natural language. A second advantage is the substantial 
empirical support for the BFM/FFM, much more support than has been provided for 
the DSM-IV description of personality disorders. Finally, the BFM/FFM is much 
more comprehensive than the DSM-IV classification. The BFM/FFM, in a fairly 
succinct way, provides a description of a wide range of both adaptive and maladap- 
tive personality traits. Rather than being a listing of relatively obscure traits that 
have little relevance, the BFM/FFM traits include major normal and abnormal per- 
sonality traits that have been the subject of much research and clinical attention. 


Development of the SIFFM 


The SIFFM was developed in order to provide an interview measure of the Five- 
Factor Model of personality. Although several self-report measures exist for this 
purpose, no structured interview to assess the Big Five or FFM has ever been devel- 
oped. Previously, we discussed the desire and need for an interview instrument that 
assesses the Five-Factor Model (Widiger & Costa, 1994; Widiger & Trull, 1992, 
1997). An interview-based measure is important for several reasons. First, there is a 
great deal of concern that self-report inventory scores may be significantly affected 
by current mood state. Although interview based scores may not be completely im- 
mune from such an influence, their format (which allows for clarifications and addi- 
tional probes) suggests that interviews may be less susceptible to the influence of 
temporary mood states. Second, many mental health professionals seem to prefer 
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Table 1. Domains and facets of personality assessed by the Structured Interview for the Five- 
Factor Model of Personality (SIFFM) 


—————————M + : : . 
Neuroticism Anxiety; Hostility; Depression; Self-Consciousness; Impulsiveness; 
Vulnerability 
Extraversion Warmth; Gregariousness; Assertiveness; Activity; Excitement 
Seeking; Positive Emotions 
Openness to Experience Fantasy; Aesthetics; Feelings; Actions; Ideas; Values 
Agreeableness Trust; Straightforwardness; Altruism; Compliance; Modesty; 
Tendermindedness 
Conscientiousness Competence; Order; Dutifulness; Achievement Striving; 


Self-Discipline; Deliberation 


interview-based measures of personality and psychopathology (Rogers, 1995). For 
example, the discrepancy between self-report inventory and interview-based scores 
is well-documented (Perry, 1992; Zimmerman, 1994). When faced with such a dis- 
crepancy, clinicians and clinical researchers will often assume that interview-based 
assessments are likely to be more valid because the assessment occurred face-to-face 
and because there were opportunities for follow-up questions and probes (Perry, 
1992). Finally, in order to establish the construct validity of these personality dimen- 
sions and traits, instruments using different modes of assessment (other than self- 
report) are necessary. Therefore, from both a clinical and research perspective, an 
interview assessing the personality traits that are included in the Five-Factor Model 
of Personality is needed and, for some clinicians and researchers, can at times be a 
preferred method of assessment. 

Currently, the most frequently used self-report measure of the Five-Factor Model 
of personality is the Costa and McCrae (1992) NEO-PI-R (Briggs, 1992; Widiger & 
Trull, 1997). An attraction of the SIFFM is that it was coordinated closely with the 
NEO-PI-R. The interview questions included within the SIFFM are probes for the 
assessment of the domains and facets of the FFM as described and assessed by the 
NEO-PI-R. However, the SIFFM also attempts to assess systematically both the 
normal/adaptive and abnormal/maladaptive variants of the personality traits that 
comprise the Five-Factor Model of personality. The SIFFM includes questions that 
specifically target features indicative of maladaptive variants of each of the 30 facets 
of the NEO-PI-R. Because our interview assesses adaptive and maladaptive levels of 
personality traits, we believe that the SIFFM will appeal to clinicians and research- 
ers who are interested in evaluating individuals for personality pathology. 


Format of the SIFFM, scoring, and interpretation 


The SIFFM assesses the five domains of personality functioning that represent the 
Five-Factor Model (РЕМ) of personality. Table 1 presents an overview of the do- 
mains and facets of personality assessed by the SIFFM. 
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Table 2. Example of three SIFFM items and anchor points for scoring 


Le ————— S 
9. Do you consider yourself to be a depressed person? IF YES: Do you often feel worthless, 
lonely, or blue? How (опа do these periods last? How often do you feel happy?) 
0 NO; does not consider self a depressed person. 
1 YES; considers self a depressed person; feels down a significant 
amount of the time. 
2 YES; considers self a depressed person; rarely (if ever) feels happy; 
depressive periods last a long time. 
29. Do you prefer to do most activities with other people or alone? IF WITH OTHER PEOPLE: 
Are you rarely by yourself? 
0 Prefers to do most activities alone 
1 Prefers to do most activities with other people 
2 Rarely alone 
37. Do you have many friends? IF NO: Do you have any close friends ог confidants? 
2 YES; does have many friends 
1 NO; does not have many friends 
0 NO; does not have any close friends or confidants 
Length 


The SIFFM contains 120 interview items, and requires approximately 1 hour to ad- 
minister. The SIFFM contains 4 initial probes for each of the 30 facets 
covered in the Five-Factor Model of personality (1.е., 24 probes for each of the five 
major domains of the FFM). 


Instructions 


Scores on the SIFFM reflect the degree to which a particular personality trait is pre- 
sent. Therefore, the interviewer provides appropriate instructions for answering 
SIFFM items, such that respondents’ answers reflect their "usual selves." Specifi- 
cally, the following instructions are provided: 


"There are no "right" or "wrong" answers to the questions I will be asking you. The 
questions deal with how vou see yourself and how others may see you. It is important 
that you give your honest opinion and that your answer reflect the way you usually 
are. That is, answer the questions in a way that describes your "usual self." 


Further, because the level of a personality trait is being assessed, it is important 
that interviewers ask for examples of behavior that demonstrate the trait in question, 
when appropriate. The SIFFM questions are relatively straightforward, and one can 
generally trust that the respondents have adequately understood them and that their 
responses can be taken at face-value. However, a potential advantage of semi- 
structured interviews relative to self-report inventories is the opportunity to ask for 
examples and illustrations of the respondents' opinions and self-descriptions to en- 
sure that they have in fact adequately understood the intention and meaning of the 
test items. In many places throughout the interview, we prompt the interviewer to 
ask for examples. However, the interviewer should feel free to ask for examples 
throughout the interview in order to collect more information on the trait. 
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Format for rating interview responses 


Answers for each SIFFM item are rated on a 3-point scale (0, 1, 2), that is ordinal in 
nature. A higher score indicates that the trait in question is present to a greater de- 
gree. For example, a score of “1” on the SIFFM question #9 assessing the facet of 
depression indicates that the respondent considers his-/herself to be a "depressed 
person" and feels worthless, lonely, or blue a significant amount of time (see Table 
2). A score of “2” on this interview item indicates that in addition to considering his- 
/herself to be a depressed person, the respondent rarely feels happy and the depres- 
sive periods may last for quite some time. Therefore, in this example, a higher score 
indicates a greater degree or severity of trait depression. 

Our choice of a 3-point scale allows for variability in scores and is the direct re- 
sult of pilot testing different response formats. In these preliminary studies, highly 
reliable scores for SIFFM items were obtained using a 3-point rating scale. 


Scoring SIFFM items 


The administration and scoring of individual SIFFM responses is relatively straight- 
forward. Each SIFFM item follows a similar format. An initial (probe) question is 
posed, and more questions may or may not be asked depending on the response to 
this initial question. For example, Table 2 presents SIFFM item #9. The initial probe 
question is, “Do you consider yourself to be a depressed person?" If the respon- 
dent's answer is "No" or strongly suggests a negative answer (e.g., "Only on very 
rare occasions"), the interviewer records this response and then moves on to the next 
SIFFM item. Оп the other hand, if a “Yes” or positive response is offered, the inter- 
viewer proceeds to the next set of questions that are enclosed in parentheses. In our 
example, SIFFM item £9 (Depression), the interviewer would then ask, "Do you 
often feel worthless, lonely, or blue? How long do these periods last? How often do 
you feel happy?" 

Under each set of questions for each item are the scores (0, 1, 2) and operational 
guidelines for each possible score. The scoring of SIFFM responses is facilitated by 
the provision of descriptions of responses necessary to receive each of the three pos- 
sible scores. These guidelines also serve to prompt the interviewer to ask additional 
questions of the respondent in order to obtain the information necessary to provide 
the most accurate score. The interviewer simply matches the answer(s) to each item 
with the appropriate descriptor and then circles that score. 

Two additional comments regarding administration and scoring are warranted. 
First, although a positive ("Yes") response to the initial probe question typically 
indicates that the interviewer should proceed to additional questions for each item, 
this is not always the case. To cite two examples, the probe question for SIFFM item 
#29 (Gregariousness) is "Do you prefer to do most activities with other people or 
alone?" Obviously, this is not a yes/no question. In this case, as indicated in the text 
that is in bold and in all capital letters (IF WITH OTHER PEOPLE), the inter- 
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viewer proceeds if the respondent's answer indicates that he or she prefers to do 
most activities with other people. А second example is SIFFM item #32 (Gregari- 
ousness). The probe question is, "Do you have many friends?" In this case, a “Мо” 
response will lead the interviewer to proceed to the additional questions. 

This brings up the second issue. Because we attempted to provide a relative bal- 
ance of questions that assess high versus low levels of the trait in question, the order 
of the scores for SIFFM items is sometimes reversed. For example, note that the 
order of the scores for item #32 is 2, 1. 0 instead of 0, 1, 2. This reverse-order is 
necessary because this item assesses low levels of the trait in question — in this 
case, the trait of gregariousness. A "Yes" answer to the probe (“Оо you have many 
friends?") indicates relativelv higher levels of gregariousness and warrants a score of 
"2." However, a "No" to the probe followed by a response indicating no close 
friends or confidants would earn a score of "0." To simplify matters (so reverse- 
score algorithms do not have to be employed at a later stage in scoring, and so the 
meaning of individual item scores as recorded in the interview booklet is clear), we 
provided the appropriate score next to the respective scoring guideline. 


Calculating SIFFM facet and domain scores 


SIFFM facet and domain scores are calculated by adding up the relevant item scores. 
Because SIFFM items are organized by domain and facet, these calculations are 
fairly simple. In the interview booklet, the four items on each page make up a facet 
score. For example, SIFFM items #1 through #4 target the Neuroticism facet of 
Anxiety. The sum of the obtained scores for these four items is the Anxiety facet 
score. The sum of the facet scores for Anxiety, Angry Hostility, Depression, Self- 
Consciousness. Impulsiveness, and Vulnerability is the Neuroticism domain score. 
Most test manuals provide "normative" data so that obtained scores can be com- 
pared to the mean scores of various groups of respondents (e.g., community resi- 
dents, clinic outpatients, etc.). In the case of the SIFFM, we did not provide norma- 
tive data for several reasons. First, very large and representative samples are neces- 
sary to support this approach. For example, in order to obtain a representative sam- 
ple of community residents, it is necessary to sample large numbers of randomly 
selected individuals who represent demographic features in the same proportions as 
those found in the population at large. Needless to say, this is a tremendous under- 
taking that can only be approximated at best. Further, this requires a great deal of 
time and expense — especially in the case of a structured interview. Only one semi- 
structured interview for mental, psychiatric disorders has obtained normative data 
(i.e., the interview used in the National Institute of Mental Health epidemiological 
research), due in large part to the costs of such a data collection (Rogers, 1995). In 
addition, in many cases, these community-based samples are significantly different 
from the individuals sampled for a particular study. For example, the economic and 
racial backgrounds of a sample of urban, inner-city participants are unlikely to be 
similar to those from a U.S. Census-matched normative group. For these and other 
reasons, we (along with others) advocate the collection of "local norms" if the clini- 
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cian or researcher prefers to contrast obtained scores with mean scores from a nor- 
mative or comparison group. 

A second reason why we have not collected or presented normative data concerns 
the way the SIFFM was constructed. One of the major distinguishing features of this 
interview is that maladaptivity or dysfunction associated with these personality traits 
is assessed in the SIFFM items. Because of this feature, SIFFM scores to some ex- 
tent reflect not only the level of a personality trait but also the degree to which it is 
problematic. This is quite different from the typical self-report inventory where high 
scores suggest high levels of the trait in question but do not necessarily indicate dys- 
function. For example, two individuals may obtain the exact same high score on a 
measure of the trait of altruism; however, for one individual this level of the trait is 
adaptive whereas for the other it is maladaptive (i.e., he or she may have been ex- 
ploited or “used” by others). SIFFM scores reflect not only the degree to which the 
trait is present, but also suggest the level of dysfunction that accompanies the trait in 
question, irregardless of the normative frequency or prevalence of the respective 
dysfunction within a particular population. 


Interpretation 


There are several possible ways to interpret SIFFM scores. At the item level, scores 
and the corresponding scoring guidelines can be examined to make inferences re- 
garding the intensity and dysfunctionality of a trait. Because there are 120 items and 
individual item scores are less reliable than composite scores (1.e., facet or domain 
scores), the exclusive use of this interpretive strategy 15 not recommended пог is it 
likely to be practical. We recommend that item-level interpretation be used only 
when facet or domain scores raise the possibility of dysfunction. 

Regarding the interpretation of SIFFM facet scores (sum of four SIFFM item 
scores), we offer the following guidelines: 


Scores 0 to 2: salient LOW level of trait 
Scores 3 to 5: MODERATE level of trait 
Scores 6 to 8: salient HIGH level of trait 


These ranges are based on the following rationale: salient LOW level scores аге 
defined at the lower end (score = 0) by four item scores of 0 and at the upper end 
(score = 2) by at least two item scores of 0 and no item score > 1, and salient HIGH 
level scores are defined at the lower end (score = 6) by ar least two item scores of 2 
and no item score < 1 and at the upper end (score = 8) by four item scores of 2. Al- 
though there are alternative ways of obtaining facet scores in these ranges, these 
general guidelines should be useful in identitying the level of a facet trait endorsed 
by the interview respondent. 

For salient low and salient high levels, some degree of dysfunction is suggested 
but not necessarily indicated. For example, salient high levels of Trust (a facet of 
Agreeableness) could be adaptive (e.g., person has faith in good intentions of others, 
readily discusses personal insecurities, problems, and vulnerabilities with others, and 
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is able to place a dependence on and faith in others), but a maladaptive variant 
would involve being excessively gullible or naive, failing to recognize that some 
persons should not be trusted, and failing to take practical cautions with respect to 
personal safety or property. Similarly, salient low levels of Trust could be adaptive 
(e.g., person is reasonably or effectively skeptical or dubious regarding proposals or 
intentions of others) but a maladaptive variant would involve being paranoid and 
suspicious of most persons, readily perceiving malevolent intentions within benign, 
innocent remarks, and often becoming involved in arguments with friends, col- 
leagues, or associates due to an unfounded belief that he or she is being mistreated, 
exploited, or victimized. As discussed below, the judgement of dysfunction should 
be based on an examination of an individual's responses to SIFFM items that spe- 
cifically assess for impairment. 

Regarding the interpretation of SIFFM domain scores (sum of 24 SIFFM item 
scores), the following guidelines are offered: 


Scores 0 to 12: salient LOW level of trait 
Scores 13 to 35: MODERATE level of trait 
Scores 36 to 48: salient HIGH level of trait 


These ranges are based on the following rationale: salient LOW level scores are 
defined at the lower end (score = 0) by 24 item scores of 0 and at the upper end 
(score 2 12) by at least 12 item scores of 0 and no item score » 1; and salient HIGH 
level scores are defined at the lower end (score = 36) by at least 12 item scores of 2 
and no item score « 1 and at the upper end (score = 48) by 24 item scores of 2. 
Again, there are alternative ways of obtaining scores in these ranges. However, these 
general guidelines should be useful in identifying the level of a domain trait en- 
dorsed by the interview respondent. 

We want to emphasize that these cutoff guidelines are rationally-based. There- 
fore, future research is necessary to empirically evaluate their utility. Further, it is 
important to note that the Salient Low or Salient High score designations do not 
necessarily indicate maladaptivity or dysfunction, only that it is more likely. The 
judgement of maladaptivity or dysfunction should be made based on an individual's 
responses to SIFFM items that specifically assess for dysfunction. 


Critical facets and items 


As noted previously, the SIFFM taps a wide range of maladaptive and adaptive per- 
sonality features. In contrast to most FFM instruments, the SIFFM contains many 
items that assess maladaptive features associated with major personality traits. For 
example, maladaptive variants of antagonism and introversion are well assessed by 
the NEO-PI-R (Costa & McCrae, 1992) but perhaps not all of the maladaptive van- 
ants of agreeableness or extraversion are equally well represented (Widiger & Costa, 
1994). The weaker representation of maladaptive agreeableness and extraversion 
relative to antagonism and introversion (respectively) in the NEO-PI-R is probably 
consistent with the extent to which traits of agreeableness and extraversion are in 
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fact generally or typically more adaptive than traits of antagonism and introversion 
(respectively). Nevertheless, the SIFFM attempts to provide a more comprehensive 
and systematic assessment of the maladaptive variants of all of the poles of the five 
domains of the FFM in order to increase the potential utility and applicability of the 
FFM for clinical settings in which the maladaptive variants of agreeableness and 
extraversion (for example) are of considerable importance and interest. 

Further, many of the interview's items assess personality features that are directly 
relevant to the DSM-IV personality disorders (APA, 1994). Those traits that are 
most relevant to the individual DSM-IV personality disorders are highlighted in Ta- 
ble 3. This table summarizes hypothesized SIFFM facet-personality disorder rela- 
tionships for each of the 10 official DSM-IV personality disorders. 

As a further clinical aid, we compiled a list of those SIFFM items that are directly 
relevant to the criteria for each of the official DSM-IV personality disorders. These 
SIFFM items explicitly tap DSM-IV personality disorder symptoms, and therefore 
have high face validity. The clinician may choose to view these as "critical items" 
for each personality disorder. These items are listed in the SIFFM manual (Trull & 
Widiger, 1997). 

Keep in mind two points, however. The SIFFM was not designed to directly as- 
sess all criteria of the DSM-IV personality disorders. Many of the personality traits 
relevant to the DSM-IV personality disorders are assessed with SIFFM items, but 
the SIFFM is not directly keyed to DSM-IV personality disorder criteria. Second, 
the potential advantage of the SIFFM is that it provides more comprehensive cover- 
age of maladaptive personality traits than DSM-IV personality disorder assessment 
instruments. The SIFFM taps many maladaptive features that are not considered in 
the DSM-IV personality disorder diagnostic system. For example, SIFFM item #68 
(an Openness to Ideas item) asks: 


Would you say that your ideas are rather traditional or old-fashioned? 
(IF YES: Are you reluctant to consider new or alternative ideas of other cultures or 
perspectives?) 


Clearly, being "closed" to ideas regarding different lifestyles, perspectives, or 
cultures can be maladaptive. Such a trait may cause problems in relationships with 
others both within and outside of an occupational context. However, this maladap- 
tive personality feature is not directly tied to any DSM-IV personality disorder. 

We hope these examples point out the broad range of personality traits that are 
assessed by SIFFM items. Because of the SIFFM's comprehensiveness, SIFFM in- 
terview scores are clinically informative from both a formal diagnostic (i.e., DSM- 
IV) as well as a more general assessment perspective. 


Development of the SIFFM 


As mentioned previously, the SIFFM targets the FFM constructs that are assessed by 
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Table 3. Summary of SIFFM trait and DSM-IV personality disorder relationships 


eee 


> 
E 
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E 
5 m ы 
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SIFFM Facet = = 5 5 5 5 Б $ |: E 
p E € i @ 808 =£ 2 3 5 
Neuroticism 
Anxiety H H H H L 
Hostility S: H [. H H H H 
Depression H H L H L 
Self-Consciousness L H H H H L 
Impulsiveness H H 
Vulnerability H H H H H/L H L 
Extraversion 
Warmth L H iL H H 
Gregariousness L iL Ё H H H 
Assertiveness iL L H H H 
Activity L E L H H 
Excitement Seeking | L H H 
Positive Emotions Е | [Е H H/L 
Openness to Experience 
Fantasy H (8 Н H H 
Aesthetics H L 
Feelings L L E H H 
Actions H L L H 
Ideas H L L L 
Values H L Ё 
Agreeableness 
Trust L L H H L | 
Straightforwardness H L L L 
Altruism H L L [Ё L 
Compliance L H |; L L 
Modesty H H L L 
Tendermindedness L H H 5 L 
Conscientiousness 
Competence L L H H 
Order H L 
Dutifulness H L 
Achievement Striving H H L L 
Self-Discipline H L L 
Deliberation L H L L L 


ОШ Му == ыа 
Note: Н = high on the trait, L = low on the trait. 
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were generated by five researchers with backgrounds in clinical psychology and the 
FFM. In response to feedback indicating difficulties in the administration or under- 
standing of any items or in the scoring of item responses, modifications to the wor- 
ding of items were made. In some cases, items were deleted or added at this initial 
stage of development. The first formal empirical examination of the SIFFM items 
involved the administration of a preliminary 169-item version to both clinical and 
nonclinical samples. This initial version included approximately 6 items per facet, 
reverse-scored items, and items targeting maladaptive levels of the traits. 

The criteria we used to determine which items to retain for the final version of the 
SIFFM included: (a) high corrected-item total correlation (high internal consis- 
tency); (b) item score correlation with its targeted NEO-PI-R facet and domain score 
(convergent validity), and relative lack of correlation with non-targeted NEO-PI-R 
facets and domain scores (discriminant validity); (c) reasonable variability in item 
scores as indicated by an examination of item means and standard deviations; and 
(d) encouraging factor loadings based on the results of a series of 30 principal axis 
factor analyses that included all items targeting a specific facet, the corresponding 
NEO-PI-R facet score, and non-targeted domain scores from the NEO-PI-R. Based 
on these criteria, we selected four SIFFM items for each of the 30 facets of the FFM. 
The final version of the SIFFM includes 120 items: four items for each facet, 24 
items for each domain. Further, 32 of the items were reverse-scored. 


Psychometric properties of the SIFFM 


Reliability 


Three types of reliability have been examined: inter-rater reliability, internal consis- 
tency, and test-retest reliability. 


Inter-rater reliability 


The degree to which interview responses can be scored reliably is of utmost impor- 
tance. А slick, crafty interview that cannot be scored reliably will not be useful to 
clinicians or to researchers. Fortunately, all studies that have evaluated the inter-rater 
reliability of SIFFM scores have reported very encouraging results. Trull ег al. 
(1998) reported high inter-rater reliability coefficients for independent ratings of 
interview audio-tapes in samples of 187 undergraduates and of 46 outpatients re- 
ceiving treatment for a psychological condition. The undergraduates were inter- 
viewed by advanced undergraduate students, graduate students in clinical psychol- 
ору, and a licensed clinical psychologist, while the outpatients were interviewed by 
graduate students in clinical psychology and a licensed clinical psychologist. Audio- 
tapes of 35 per cent (66 of 187) of the SIFFM interviews with undergraduates and of 
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43 per cent (20 of 46) of the SIFFM interviews with outpatients were reviewed and 
scored independently by a reliability checker. Intraclass correlation coefficients 
(ICCs; Shrout & Fleiss, 1979) were calculated comparing the independent ratings of 
the scores on each of the 30 SIFFM facets as well as on each of the five SIFFM do- 
main scores. The ICCs for almost all domains and facets were exceptionally high 
(i.e., greater than .90), and the reliability values for the undergraduate and clinical 
participants, respectively, did not differ significantly. 

The inter-rater reliability of SIFFM scores has also been evaluated in two other 
studies, one assessing clinical outpatients and one assessing undergraduate partici- 
pants. Trull, Vieth, Wolfenstein, and Burr (2002) administered the SIFFM to 52 out- 
patients (40 women, 12 men; M age = 36.00 [14.4]) drawn from several community- 
based health clinics. Thirty of these audio-taped SIFFM interviews were reviewed 
and scored independently by a reliability checker. Inter-rater reliability indices 
(ICCs) were excellent for both SIFFM domain scores (all ICCs = .99) and SIFFM 
facet scores (range = .94 to 1.00). 

Trull er al. (2002) also reported on the inter-rater reliability of SIFFM scores in a 
sample of 150 college undergraduates (75 men, 75 women; M age = 18.41 [0.75)). 
Ninety-three of these participants had scored above threshold on a measure of Bor- 
derline Personality Disorder features, suggesting the presence of BPD pathology. 
Thirty-eight of the 150 audio-taped SIFFM interviews were reviewed and scored 
independently by a reliability checker. All SIFFM domain ICCs were .99, and the 
ICCs for SIFFM facet scores ranged from .93 to 1.00. 


Internal consistency 


Trull et al. (1998) computed Cronbach's alpha coefficient for each SIFFM domain 
scale in their undergraduate sample and clinical sample, separately. For the under- 
graduate sample, the internal consistency coefficients ranged from .71 (Agreeable- 
ness) to .84 (Neuroticism and Extraversion), while in the clinical sample the range 
was from .72 (Agreeableness) to .97 (Neuroticism and Extraversion). Although, as 
expected (given that internal consistency is at least partially a function of item 
length), SIFFM facet internal consistencies were lower, most of the values for these 
two samples were in the acceptable range. 

In a sample of 52 clinical outpatients, Trull et al. (2002) reported that the median 
internal consistency coefficient (alpha) for SIFFM domains was .74 while the me- 
dian alpha for SIFFM facet scores was .62. In the sample of 150 college students 
described above, Trull et al. (2002) reported that the median internal consistency 
coefficient for SIFFM domains was .77 while the median alpha for SIFFM facet 
scores was .60. 
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Table 4. Test-retest reliabilities and stability indices for SIFFM domain and facet scales (N - 44) 


Pearson Correlation i Intraclass Correlation 
SIFFM Scale r ICC 
Neuroticisrt .82 ‚81 
Anxiety .82 .65 
Hostility n ant 
Depression .78 .78 
Self-Consciousness 775 E 125 
Impulsiveness .61 .59 
Vulnerability .69 .67 
Extraversion .93 .93 
Warmth .90 .90 
Gregariousness .85 .85 
Assertiveness .85 .85 
Activity ‚77 .77 
Excitement Seeking .70 -20 
Positive Emotions .81 ‚81 
Openness to Experience .89 .89 
Fantasy .83 .83 
Aesthetics .87 .87 
Feelings .78 .78 
Actions .58 .58 
Ideas .78 .78 
Values .81 .81 
Agreeableness .88 .88 
Trust 5 79 
Straightforwardness .78 .78 
Altruism .81 .81 
Compliance .66 .66 
Modesty .64 .63 
Tendermindedness .75 75 
Conscientiousness .90 .90 
Competence .78 .78 
Order .87 .87 
Dutifulness .65 „65 
Achievement Striving .64 .64 
Self Discipline .79 .79 
Deliberation .69 .68 


Test-retest reliability 


To date, only one study has examined the test-retest reliability of SIFFM scores. 
Trull er al. (1998) reported the results of a small test-retest study of SIFFM scores. 
Participants were 44 undergraduate students who completed the SIFFM on two oc- 
casions separated by two weeks. As indicated in Table 4, Pearson correlations and 
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Table 5. Correlations between SIFFM facet scores and corresponding NEO-PI-R facet scores 
{T ————»————— 
SIFFM 


N = 233 N = 187 М = 46 
Facet All Subjects Undergrad Clinic 
Neuroticism 
Anxiety 2 .59 .50 .67 
Hostility .63 .60 .68 
Depression 71 .64 .68 
Self-Consciousness .68 .60 .71 
impulsiveness .41 .37 25 
Vulnerability .63 ‚49 ‚65 
Extraversion 
Warmth 259 .59 .45 
Gregariousness .47 ‚39 .63 
Assertiveness .81 .79 .81 
Activity .60 .58 .48 
Excitement Seeking .58 251 293 
Positive Emotions .71 .64 .61 
Openness to Experience 
Fantasy .62 .61 .65 
Aesthetics .63 .67 .39 
Feelings .36 ‚49 .30 
Actions .39 .45 733 
Ideas 257 .62 .49 
Values 229. .26 .14 
Agreeableness 
Trust .70 ‚72 ‚65 
Straightforwardness .60 .61 .41 
Altruism .42 .41 .54 
Compliance .54 .54 251 
Modesty .42 137 .35 
Tendermindedness .38 .36 737 
Conscientiousness 
Competence .53 251 257 
Order 277. ‚74 ‚84 
Dutifulness 527 ‚23 235 
Achievement Striving ‚67 ‚65 ‚68 
Self-Discipline 272 .71 al 
Deliberation ‚63 ‚62 .68 


ICCs for the SIFFM domain scores were very high (range = .81 to .93), as were the 
reliability indices for facet scores in this sample (average r for facet scores = .76, 
average ICC z .75). 


Validity 


The validity of SIFFM scores has been examined in several studies. Perhaps the 
most obvious validity assessrnent involves an examination of the relations between 
SIFFM domain and facet scores and corresponding scores from the NEO-PI-R. Trull 
et al. (1998) reported that SIFFM domain and NEO-PI-R domain scores were highly 
related to each other. Specifically, combining the clinical and nonclinical samples, 
the convergent validity coefficients for the Neuroticism, Extraversion, Openness, 
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Agreeableness, and Conscientiousness domains were .77, .84, .65, ‚75, and .82, re- 
spectively. In general, SIFFM facet score correlations with corresponding facet 
scores from the NEO-PI-R were in the moderate to high range. These convergent 
validity results were replicated across clinical and nonclinical samples. Table 5 pres- 
ents these correlations. 

Axelrod, Widiger, Trull, and Corbitt (1997) reported the convergent and dis- 
criminant validity coefficients for an initial draft of the SIFFM's assessment of the 
facets of Agreeableness. Each of the respective Agreeableness facet scales from the 
NEO-PI-R and SIFFM obtained significant positive correlations. Most of the dis- 
criminant validity coefficients were statistically insignificant, and in no case was a 
discriminant validity coefficient of the same magnitude as the corresponding con- 
vergent validity coefficient. 


Personality disorder scores 


Because SIFFM scores are believed to reflect maladaptive levels of major personal- 
ity traits (and because personality disorders are purported to be comprised of mal- 
adaptive personality traits), SIFFM scores should be significantly related to person- 
ality disorder scores. Trull et al. (1998) evaluated the pattern of correlations between 
SIFFM scores and scores on a self-report measure of DSM-III-R personality disor- 
ders (the PDQ-R; Hyler & Rieder, 1987) in the combined sample of clinical and 
nonclinical participants (М = 232). Results indicated the majority of personality dis- 
order constructs were positively correlated with SIFFM Neuroticism scores, nega- 
tively correlated with SIFFM Extraversion scores, and negatively correlated with 
SIFFM Conscientiousness scores. These results are consistent with expectations 
based on an understanding of the FFM and with previous studies that examined the 
relations between personality disorder constructs and scores from alternative FFM 
measures (e.g., see Costa & McCrae, 1990; Soldz et al., 1993; Trull, 1992). Table 6 
presents the zero-order correlations between SIFFM scores and PDQ-R symptom 
counts. 

Axelrod et al. (1997) demonstrated that a differentiation among the DSM-III-R 
personality disorders (APA, 1987) can be obtained at the facet level of the FFM. 
They indicated that DSM-III-R Antisocial, Borderline, Narcissistic, Paranoid, and 
Passive-Aggressive personality disorders all correlated significantly with the broad 
domain of Antagonism (ranging in value from .26 to .50), but each of these person- 
ality disorders could also be distinguished with respect to which facets of Antago- 
nism are primarily involved. For example, the Narcissistic Personality Disorder was 
primarily negatively related to the facet of Modesty (i.e., arrogance), Tender- 
Mindedness (i.e., tough-mindendness), and Altruism (i.e., exploitation), whereas the 
Paranoid Personality Disorder was primarily negatively related to just the facet of 
Trust (i.e., suspiciousness and paranoia). Low Tender-Mindedness (i.e., tough- 
mindedness) was evident in the Antisocial and Narcissistic personality disorders but 
not in the Borderline, Paranoid, or Passive-Aggressive personality disorders. Low 
Modesty (i.e., arrogance) related only to the Narcissistic Personality Disorder. 
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e between SIFFM facet scores and PDQ-R personality disorder symptom counts 
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Assertiveness -.23 -.24 -.19 -.52 -.35 -.11 -.24 -.17 -.07 -.12 -.18 -.10 .04 
Activity -36 -.25 -.14 -.45 -.21 -.13 -.23 -.24 -.06 -.06 -.23 .03 .11 
Excite Seek -.29 -.07 -.06 -.40 -.25 -.09 -.10 -.16 -.07 .06 .02 .23 .27 
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Trust -.27 -.39 -.54 -.31 -.02 -.26 -.29 -.27 -.13 -.29 -.34 -.21 -.24 
Straightforw .06 -.12 -.14 -.03 .04 -.12 -.26 -.02 -.10 -.28 -.16 -.25 -.37 
Altruism 211 -.08 -.20 -.01 .16 -.12 -.16 .11 -.05 -.13 -.02 -.18 -.13 
Compliance -.02  .01 -.16 .06 .19 -.06 -.08 .05 .11 -.05 -.09 -.08 -.21 
Modesty NES ИНИОСНИ 27 2209055713526 1-03. -11 16605816 
Tender Mind СОВ 05-15 IT 28.403 7:05 .15 20 .00 .00 -21 -29 
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Order -.04 -.03 .07 -.12 -.14 -.06 -.22 -.06 -.03 -.03 -.04 .00 .11 
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Achmt Strv -23 -25 -07 -.26 -27 -.12 -.39 -.21 715 -.05 —21 -.27 .02 
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Trull, Widiger, and Burr (2001) provided а more comprehensive and thorough 
study of the importance of considering the underlying facets of each of the domains 
of the FFM, testing in particular the predictions of Trull and Widiger (1997; see Ta- 
ble 3 of this chapter) regarding which FFM lower-order traits (i.e., facets) are most 
highly related to individual personality disorder constructs. Trull et al. (2001) evalu- 
ated specific/unique associations by controlling for personality disorder comorbid 
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symptoms (i.e., partialling out symptoms of non-targeted personality disorders). In 
general, results supported the predictions of Trull and Widiger (1997) in that many 
of the significant relations between facets and personality disorder constructs held, 
even after controlling for comorbid personality disorder symptoms. For example, 
both the Dependent and Avoidant Personality Disorders (which are highly comorbid 
in clinical settings; Oldham ег al., 1992) correlated highly with the broad domain of 
Neuroticism. However, the Dependent Personality Disorder was associated primar- 
ily and uniquely with the Neuroticism facets of Depression and Vulnerability, 
whereas the Avoidant Personality Disorder was associated primarily and uniquely 
with the Neuroticism facet of Self-Consciousness. Similarly, whereas both the 
highly comorbid Avoidant and Schizoid personality disorders correlated highly with 
Introversion, each could again be differentiated with respect to particular facets of 
Introversion. Schizoid Personality Disorder was associated primarily and uniquely 
with low Positive Emotions (along with low Gregariousness and low Warmth). 
Avoidant Personality Disorder, on the other hand, was associated primarily and 
uniquely with low Assertiveness and low Excitement-Seeking (along with low Gre- 
gariousness), consistent with the expected differentiations of these personality disor- 
ders at the facet level (Trull & Widiger, 1997). 


Peer report of FFM traits 


Another method of establishing the validity of SIFFM scores involved the collection 
of peer/collateral data on the FFM. Trull ег al. (1998) obtained peer reports of FFM 
traits on a sample of 55 undergraduates who also completed the SIFFM. For each 
participant, a male and a female peer (each of whom reported knowing the partici- 
pant well and for at least two years) completed the NEO-FFI (Costa & McCrae, 
1992), an inventory that yields scores on the five major dimensions of the FFM. 
Each peer NEO-FFI was standardized according to the gender norms corresponding 
to the sex of the target. A mean peer NEO-FFI score for each domain was computed 
by averaging the scores from the two peers for each participant. Zero-order correla- 
tions between participant's SIFFM scores and the mean peer NEO-FFI T scores 
were calculated. Convergent validity coefficients for FFM domains were .52 (Neu- 
roticism), .67 (Extraversion), .32 (Openness), .39 (Agreeableness), and .47 (Consci- 
entiousness). These convergent validity coefficients were higher than off-diagonal 
values in the corresponding rows and columns, suggesting that observer ratings of 
FFM traits converged on the same trait domains as assessed by the SIFFM. 


Incremental validity 


Several studies have assessed the incremental validity of SIFFM scores in predicting 
personality disorder symptoms. If the SIFFM, indeed, does a better job of tapping 
into maladaptive variants of major personality traits, one would expect that SIFFM 
Scores could account for significant amounts of variance in personality disorder 
scores above and beyond what could be accounted for by popular measures of major 
personality traits. 
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Using a combined sample of clinical and nonclinical participants (N = 232), Trull 
et al. (1998) examined the incremental validity of SIFFM domain scores in predict- 
ing DSM-III-R personality disorder symptom scores as assessed by the РРО-Е. In a 
series of hierarchical regressions (using each personality disorder symptom score as 
the criterion, respectively), Trull et al. assessed the ability of SIFFM domain scores 
to predict PDQ-R scores once NEO-PI-R domain scores had been entered into the 
regression model. Results indicated that in all cases except for Paranoid, Obsessive 
Compulsive, and Narcissistic PDQ-R scores, SIFFM domain scores demonstrated 
incremental validity. The ability of SIFFM facets (hypothesized to be most highly 
related to the target personality disorder; see Table 3) to show incremental validity 
over the NEO-PI-R domain scores in predicting personality disorder scores on the 
PDQ-R was evaluated in a series of hierarchical regressions (Trull & Widiger, un- 
published data). Results suggested that the block of respective SIFFM facets ac- 
counted for a significant amount of the variance in the targeted personality disorder 
score over and above what could be accounted for by the NEO-PI-R domain scores 
in all cases except for Obsessive Compulsive personality disorder. These results are 
presented in Table 7. 

Trull ег al. (2002) reported on a series of incremental validity analyses that fo- 
cused on the ability of SIFFM scores to account for variance DSM-IV Borderline, 
Antisocial, and Histrionic symptom counts (as assessed by a semi-structured diag- 
nostic interview) above and beyond what could be accounted for by Temperament 
scores (i.e., Negative Temperament, Positive Temperament, Disinhibition) from the 
SNAP (Clark, 1993). Subjects included both clinical and nonclinical participants 
(combined: N = 202). In the first series of analyses, SIFFM domain scores accounted 
for a significant amount of additional variance in Borderline (AR? = .09, p < .0001), 
Antisocial (AR^ = .06, p < .005), and Histrionic (AR? = .12, p < .0001) symptom 
counts, over and above that accounted for by SNAP Temperament scores. Using this 
same combined sample, a second series of analyses examined the incremental valid- 
ity of selected SIFFM facets believed to be most relevant to each of the three per- 
sonality disorders, respectively (see Trull & Widiger, 1997), over SNAP scales pur- 
ported to assess each personality disorder (Clark, 1993). Once again, the block of 
SIFFM facet predictors showed significant predictive ability over and above SNAP 
scores in accounting for variance in Borderline (AR? = .04, p « .05), Antisocial (AR? 
= .06, p < .05), and Histrionic (AR? = .16, p < .0001) symptom counts. 

These incremental validity studies, however, should not be understood as sugges- 
tions that the SIFFM would provide a better or more valid assessment of the FFM 
than the NEO-PI-R. We expect that further research, particularly studies within the 
general population and community, would indicate that the NEO-PI-R provides a 
more reliable and valid assessment of the FFM than the SIFFM, and may even do so 
for some purposes within clinical settings (Widiger & Trull, 1997). The ideal use of 
the SIFFM will be as a supplement to, rather than as a replacement of, the NEO PI- 
R. This is particularly the case in instances where there are concerns regarding po- 
tential distortions to respondents’ answers secondary to mood states or when there is 
a particular need for using a semi-structured interview methodology. 
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Summary 


The SIFFM was designed as an interview-based assessment of both adaptive and 
maladaptive features related to the personality traits included in the FFM. The 
SIFFM will be attractive to clinicians and researchers who prefer interview-based 
assessments or who would like to employ an assessment measure that will comple- 
ment the NEO-PI-R. To date, data obtained regarding the reliability and validity of 
SIFFM scores have been encouraging. The SIFFM is easy to administer and to 
score. Further, scores derived from the SIFFM have been shown to be related to 
other measures of personality traits and measures or DSM personality disorders in a 
pattern that would be expected given a knowledge of these traits and disorders. Fi- 
nally, several studies have demonstrated that, by virtue of the measure's design, 
SIFFM scores appear to tap into maladaptive variants of personality traits that may 
not be assessed by other alternative personality measures. Although these results are 
indeed promising, more research is needed to assess the validity and utility of the 
SIFFM. 
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Chapter 8 


The nonverbal asssessment of personality: The 
NPQ and the FF-NPQ 


Sampo V. Paunonen 
Michael C. Ashton 


Introduction 


The purpose of this chapter is to describe the development of two novel measures of 
personality characteristics. The first measure is called the Nonverbal Personality 
Questionnaire (NPQ) and the second is the Five-Factor Nonverbal Personality 
Questionnaire (FF-NPQ). What makes these two measures novel is that they do not 
employ verbal item content. Nonverbal stimuli are used as items instead, in an other- 
wise standard paper-and-pencil personality questionnaire. 


Nonverbal personality assessment 


Nonverbal measures of personality have existed for many years. Examples include 
the popular Rorschach inkblot test and the Thematic Apperception Test (TAT). 
Those measures, however, are different from the ones to be described in this chapter. 
Whereas the Rorschach and the TAT are both unstructured, in the sense that exami- 
nees are allowed to generate open-ended verbal responses to the items, the NPQ and 
FF-NPQ are structured measures. This means that a person completing the tests 
must choose his or her responses to each nonverbal item from a series of alternatives 
or response options. 

There are some obvious advantages to a nonverbal measure of personality. One 
advantage is for cross-cultural personality research — the fact that item translation is 
not necessary for using such measures in different cultures and language groups 
means that one impediment to such cross-cultural research has been removed. (For 
the NPQ and FF-NPQ, the instruction page has verbal content that would, of course, 
require translation.) A related advantage is that any differences found across cultures 
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in the psychometric properties of the nonverbal measure cannot be attributed to poor 
item translations. In contrast, if different reliabilities were found across cultures on a 
verbal personality scale, the difference could partly be due to translated items that 
misrepresent the personality construct. Another advantage of a nonverbal measure of 
personality is the possible application to respondents for whom verbal items pose 
certain difficulties. We refer to individuals with reading difficulties, people with a 
short attention span, young children, and those with a poor grasp of the language. 

The NPQ initially arose out of some research in which we created illustrated be- 
havioral acts as stimulus materials for a study in person perception (Paunonen & 
Jackson, 1979). Those nonverbal stimuli were re-worked into the form of a self- 
report personality questionnaire. That initial questionnaire was then revised into the 
present NPQ. The FF-NPQ was subsequently developed to measure the Big Five 
personality factors using nonverbal items of the type found in the NPQ. In fact, most 
of the items of the FF-NPQ were selected from the NPQ. 


Construction of the NPQ 


In our study of person perception (Paunonen & Jackson, 1979), we required a set of 
behavioral acts, being exemplars of common personality traits, that could be used to 
describe a person, but without words. Visual depictions of behavior were thus 
needed. An artist was consequently commissioned to draw some 200 cartoon-like 
pictures of a person performing specific behaviors in specific situations. The illus- 
trated behaviors were each intended to portray one of 17 traits in Murray's (1938) 
system of needs, with between 9 and 14 behavior scenarios each. Such needs, or 
traits, include Affiliation, Dominance, Nurturance, and so on, and are measured by 
more traditional verbal personality inventories, such as Jackson's (1984) Personality 
Research Form (PRF). 

It occurred to us in preparing the materials for our person perception study that 
the nonverbal items used in that research might also be useful as self-report items in 
а paper-and-pencil personality inventory format. To this end, we prepared an initial 
202-item nonverbal personality questionnaire, which is the basis of the current NPQ, 
having 17 trait scales and an Infrequency validity scale. An example nonverbal item, 
depicting thrill-seeking behavior, can be seen in a reproduction of side one of the 
NPQ Instructions and Rating Form page shown in Figure 1. Other examples of the 
NPQ items can be found in some of the articles cited in this chapter (Paunonen, 
Ashton, & Jackson, 2001; Paunonen & Jackson, 1979; Paunonen, Jackson, & 
Keinonen, 1990; Paunonen, Keinonen, Trzebinski, Forsterling, Grishenko-Rose, 
Kouznetsova, & Chan, 1996). 
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NONVERBAL PERSONALITY 
QUESTIONNAIRE 


Instructions and Rating Form 


Attached is a Picture Booklet containing a series of illustrations depicting acentral figure (the 
one with the hair drawn in) performing specific behaviors in certain situations. 


“Seer look ateach illustration and rate the/ikelihood that you would engage in thetype of behavior 
shown. 


Using the Rating Form on the other side of this page, record your responses by selecting an 
appropnate number from the 7-point rating scale. Consider the example below: 


- extremely likely that 1 would perform this type of behavior 

- very likely that 1 would perform this type of behavior 

- moderately likely that I would perform this type of behavior 

- neither likely nor unlikely that I would perform this type of behavior 
moderately unlikely that I would perform this type of behavior 

- very unlikely that I would perform this type of behavior 

- extremely unlikely that I would perform this type of behavior 


In this example the person has responded that it would bevery likely that he/she would engage in 
the kind of activity in which the central figure is engaging. Your own response might have been 
different. 


Ina similar manner, consider each illustration in thePicture Booklet and estimate the likelihood 
that you would engage in the type of behavior depicted by the central figure. 


Please respond to every picture and record your response on the 7-point rating scales on the back 
of this page. Do not mark the Picture Booklet or any other materials. 


Turn Page Over... 


Figure 1. Instructions and Rating Form page (obverse side) of the Nonverbal Personality Question- 


naire. 
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Table 1. Nonverbal Personality Questionnaire trait scale names and descriptions 


Achievement Aspires to accomplish difficult tasks; responds positively to competition. 

Affiliation ` Enjoys being with friends and people in general; is sociable. 

Aggression Enjoys combat and argument; easily annoyed; likes to "get even." 

Autonomy Abhors restraints, confinements, or restrictions imposed by people or places; 
is independent. 

Dominance Attempts to influence or direct other people; enjoys the role of leader. 

Endurance Is willing to work long hours; does not give up easily. 

Exhibition Wants to be the center of attention; enjoys having an audience. 

Impulsivity Tends to act on the "spur of the moment;" is emotionally volatile. 

Nurturance Gives sympathy and comfort to those in need; assists others whenever 
possible. 

Order Keeps personal effects and surroundings neat and organized. 

Play Enjoys games, sports, social activities, and other amusements. 

Sentience Notices smells, sounds, sights, tastes, and textures; maintains an aesthetic 
view of life. 


Social Recognition ts concerned about reputation and what other people think; seeks their 
esteem and approval 


Succorance Frequently seeks sympathy, protection, love, and advice 

Thrill-seeking Enjoys exciting or dangerous activities; not overly concerned with personal 
safety. 

Understanding Is intellectually curious and reflective; wants to understand many areas of 


knowledge. 


Note in Figure 1 the instruction to respondents in completing the NPQ; they are 
asked to estimate “the likelihood that you would engage in the type of behavior 
shown," using a seven-point rating scale. They are not asked whether they have ever 
performed the depicted behavior, or even whether they are likely to perform the be- 
havior exactly as illustrated. Instead, the rating instructions emphasize the idea of 
each behavior item in the questionnaire as being an exemplar of a class or domain of 
behaviors. Also note the central character in each item is intended to be sex-neutral, 
so that both men and women can identify with the depicted behaviors. 

As will be outlined in the following sections, the final NPQ form represents a 
136-item subset of the 202 nonverbal items we initially created (Paunonen & 
Jackson, 1979), the items being keyed on 16 of the 17 original trait domains plus 
Infrequency. (The Infrequency scale contains items that are likely to be endorsed 
only by someone who is completing the questionnaire thoughtlessly. One such item, 
for example, shows a person drinking poison.) The 16 personality traits measured by 
the NPQ are shown in Table 1. Note that the verbal PRF also measures those same 
traits, among a few others. The NPQ, in fact, was first conceived as a nonverbal 
counterpart to the verbal PRF. 


Initial NPQ item analysis 


In our first analysis of the nonverbal personality items, we administered the 202- 
item pool to four groups of respondents (Paunonen er al., 1990). All four samples 
consisted of university undergraduates, three of them samples of Canadian students 
and one group of students at the University of Helsinki. АП samples but one (a Ca- 
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nadian sample) also completed the verbal PRF measures of Murray's (1938) system 
of needs. Because the PRF measures the same traits as does the NPQ (Table 1), we 
could use the former as a criterion for validating the latter. 

On the whole, we found the psychometric properties of the experimental nonver- 
bal scales to be acceptable in that study (Paunonen et al., 1990). In all four samples, 
the internal consistency reliability coefficients were good, with an overall mean of 
„71. This is relatively high considering the scales each had only 11.2 items on aver- 
age. The convergent validities of the nonverbal and verbal measures of the same 
traits averaged .49 over the different scales and the three respondent samples having 
criterion data. 

It should be noted at this point that the convergent validities between the nonver- 
bal and verbal measures of the same traits should not necessarily be expected to be 
too high. The reason is that the nonverbal depictions of behavior, by their very na- 
ture, can only refer to observable actions of people. Verbal depictions of behavior, in 
contrast, can refer to unobservable behaviors such as wishes, preferences, and sen- 
timents. Therefore, some of the PRF trait scales refer to behaviors that are not repre- 
sented in the corresponding NPQ trait scales, a fact that will tend to attenuate their 
intercorrelations. 

Our next step was to eliminate some of the least psychometrically desirable non- 
verbal items from the 202-item questionnaire. Our target was to have eight items in 
each trait scale. Thus, we used a sequential ranking procedure to select items. This 
involved ordering the items on a trait scale according to some psychometric property 
and eliminating those items that failed to meet a minimum standard. The remaining 
items were then ordered with reference to some other psychometric property. If 
more than eight items survived the various psychometric hurdles, the overall best 
eight items were chosen for the final 136-item NPQ. Our sequential, stepwise item 
selection procedure in illustrated in Figure 2. 


The revised nonverbal inventory 


The nonverbal Abasement scale was deleted in the very earliest stages of test con- 
struction because of its generally poor item and scale properties, particularly in the 
Finnish sample. The remaining 16 scales (plus Infrequency) were each shortened to 
eight items using the procedures shown in Figure 2. Despite shortening the scales' 
lengths by about 30 per cent on average, the mean reliability computed on the item 
analysis samples (described above) was 70, a value not much less than the .71 mean 
reliability of the longer nonverbal scales. This illustrates an important psychometric 
point: shorter scales are not necessarily less reliable than are longer scales. A few of 
the items in the longer scale could actually be contributing very little to the reliabil- 
ity of the measure, or even detracting from it, because of their poorer psychometric 
properties relative to other scale items (see also Paunonen, 1984; Paunonen & Jack- 
son, 1985). 
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Item's correlation with own nonverbal scale 
Item's correlation with irrelevant nonverbal scale 
Item's correlation with nonverbal Infrequency scale 
Item's correlation with corresponding verbal PRF scale 
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Figure 2. Flowchart of the construction of the Nonverbal Personality Questionnaire. From "The 
structured nonverbal assessment of personality," by S. V. Paunonen, D. N. Jackson, & M. Keinonen, 
1990, Journal of Personality, 58, p. 491. Copyright 1990 by Blackwell Publishers. 
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Poor items included in a scale do not just affect its reliability, but they also can 
attenuate the scale's convergent validity. That is, shorter scales can also be more 
valid than the longer scales of which they are a part. Indeed, for the present NPQ 
scales, their convergent validities with the PRF criterion scales generally improved 
after the nonverbal scales were shortened to eight items each. The mean correlation 
between revised scales of the NPQ and the PRF for three respondent samples was 
„52, slightly higher than the .49 mean validity computed for the longer scales. 

Another test of the validity of a personality measure that is sometimes applied is 
to correlate self-ratings on the measure with peer ratings of the respondents. This 
was possible in this study (Paunonen er al., 1990) because one of the Canadian 
samples included both types of data. Specifically, the subjects in that sample com- 
pleted the NPQ items in a self-report format and a peer report format. (In the latter 
format, respondents were asked to judge a nonverbal item on the likelihood that the 
peer being rated would perform the type of behavior illustrated.) Note that the sub- 
jects rating each other in that study were not all of high acquaintanceship. In fact, 
they were measured beforehand on mutual acquaintanceship and then assigned into 
dyads representing eight different levels of familiarity. Each person in each dyad 
then rated both self and peer on all 136 NPQ items. 

Figure 3 shows the mean self-peer trait correlations on the NPQ as a function of 
degree of acquaintanceship. Also shown is a comparable graph for self-peer correla- 
tions on the verbal PRF trait scales (see Paunonen, 1989, for more details about 
these peer ratings). Both graphs show a generally linear increase in rater accuracy as 
acquaintanceship increases. Note that the average level of correlation for the non- 
verbal scales is about .21, which is not much less than the average correlation of .24 
for the longer and more established PRF scales. Although neither of these two mean 
values seems very high, one must remember that they include varying levels of tar- 
get-rater acquaintanceship. At the highest levels of acquaintanceship, self-peer cor- 
relations are in the neighborhood of .40, both for the verbal and the nonverbal 
scales. We consider this to be more than satisfactory given that (a) the NPQ scales 
are quite short with only eight items each, and (b) the Revised NEO Personality In- 
ventory (NEO-PI-R) domain scales show an average self-peer correlation of around 
.43, and they have 48 items each (Costa & McCrae, 1992, p. 50). 


Cross-cultural evaluations of the NPQ 


Since its construction, we have administered the NPQ to respondents in several dif- 
ferent countries. In all cases, we were interested in the generalizability of the NPQ's 
psychometric properties. Our particular concern was with the nonverbal scales’ reli- 
abilities, criterion validities, and factor structure across cultures. 
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Figure 3. Mean self-peer trait correlation on Nonverbal Personality Questionnaire scales (filled 
circles) and Personality Research Form scales (open circles), as a function of degree of acquaintan- 
ceship. From "The structured nonverbal assessment of personality," by S. V. Paunonen, D. N. 
Jackson, & M. Keinonen, 1990, Journal of Personality, 58, p. 497. Copyright 1990 by Blackwell pu- 
blishers. 


Psychometric properties 


In a study reported by Paunonen, Jackson, Trzebinski. and Forsterling (1992), our 
primary goal was to provide a multimethod cross-cultural evaluation of personality 
structure. We were interested particularly in seeking evidence for the Five-Factor 
Model of personality, using both verbal and nonverbal data collected in different 
countries. At the same time, our purpose was to evaluate the factorial validity of the 
NPQ scales. Any new omnibus measure of personality must, in general, fit into the 
trait nomological network established by existing measures of the same constructs 
(Loevinger, 1957). That is, the personality factors underlying a new measure should 
reproduce that factors underlying an established measure. If the factors are different, 
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then one can reasonably question the construct validity of the new personality in- 
strument. 

Recall that the NPQ was designed to measure most of the same traits measured 
by the PRF. Also recall that the NPQ scales have shown acceptable levels of corre- 
lation with corresponding PRF scales. However, the non-corresponding scales 
should also correlate with roughly the same pattern in the two inventories. Thus, the 
intercorrelations among the NPQ scales should be approximately the same as the 
intercorrelations among the PRF scales. This is the issue of the nonverbal inven- 
tory's factor structure, referred to above, which bears on the issue of its construct 
validity. And, factor analysis is a method perfectly suited for addressing those is- 
sues. 

In this study (Paunonen ef al., 1992), we evaluated NPQ data from respondents 
in four countries: Canada, Finland (these two samples were the same as those used 
by Paunonen et al., 1990), Poland, and Germany. Criterion scores on the РВЕ scales 
were also available for the three European groups. The NPQ internal consistency 
reliabilities were good, with an overall mean of .67 (Canada = .65, Finland = .70, 
Poland z .65, Germany z .67). The mean NPQ-PRF convergent validity for the cor- 
responding trait scales in the three European samples was .46 (Finland = .50, Poland 
= .40, Germany = .47), a value slightly lower than the .52 reported by Paunonen et 
al. (1990). Discriminant validity estimates were also computed for the NPQ trait 
scales as the mean absolute heterotrait correlation. These mean values were .16, .14, 
and .15, for Finland, Poland, and Germany, respectively. Those discriminant 
validities were, of course, appropriately low. 

Results of factor analyses of the four samples' verbal and nonverbal personality 
data were remarkably consistent. In each analysis, we found five factors using the 
scree test. Furthermore, when those factors were rotated to a common orientation in 
the factor space (in this case, the PRF factor structure reported by Skinner, Jackson, 
& Rampton, 1976; cf. Ashton, Jackson, Helmes, & Paunonen, 1998; Jackson, Pau 
nonen, Fraboni, & Goffin, 1996) using an orthogonal Procrustes transformation 
(Schónemann, 1966), the solutions were quite similar. The NPQ-PRF factor com- 
parisons for corresponding dimensions yielded a mean coefficient of congruence of 
.85, averaged across all within-country factor comparisons. There was no tendency 
for any one country to have higher factor convergence than another, in general, or 
for any one factor to have higher convergence. And, only one NPQ-PRF factor 
comparison (for the Polish sample on what was labeled the Aesthetic-Intellectual 
factor by Skinner et al., 1976) failed our significance test of the congruence coeffi- 
cient based on random data (see Paunonen, 1997). All of the within-country and 
between-country factor congruence coefficients are reproduced in Table 2. 

Another important result of this study (Paunonen er al., 1992) was in the nature 
of the factors we discovered in the cross-cultural data with both the verbal and the 
nonverbal personality inventories. Those factors clearly resembled the traditional 
Big Five personality dimensions. In fact, they had high congruence coefficients with 
independent factors discovered by Costa and McCrae (1988) in PRF data, which 
those authors interpreted as strongly suggesting the Big Five. The mean congruence 
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Table 2. Coefficients of congruence between corresponding factors derived from responses to the 
Personality Research Form (PRF) and the Nonverbal Personality Questionnaire (NPQ) in four coun- 
tries 

o eT —  Ó——ÓÓ—— 


Factor 

Academic Aesthetic- | Social 
Comparison Orientation Intellectual Autonomy Aggression Control 
PRF—NPQ Canada ‚91 ‚78 | ‚92 ‚84 .88 
PRF—NPQ Finland ‚83 ‚89 .90 .86 ‚89 
PRF—NPQ Poland .85 .61 .80 72 .90 
PRF—NPQ Germany .90 .83 .93 .74 .92 
Canada—Finland РКЕ .96 .90 „97 ‚92 ‚91 
Canada—Poland PRF .94 .93 .95 .93 .96 
Canada—Germany PRF 97 ‚94 .96 .92 ‚97 
Finland—Poland PRF .92 .82 .95 .88 .94 
Finland—Germany PRF .95 .84 .97 .89 .89 
Poland—Germany PRF .95 .95 .96 .93 .95 
Canada—Finland NPQ .93 .98 97 .99 .97 
Canada—Poland NPQ. .93 .96 .90 .96 EDS 
Canada—Germany NPQ ‚93 ‚96 ‚92 .80 .97 
Finland—Poland NPQ .93 .93 .89 .95 .91 
Finland—Germany МРО .94 .95 .86 .76 222 
Poland—Germany NPQ .91 .95 .86 .82 .92 


Note: Factor labels from the Skinner et al. (1976) solution. From "Personality structure across 
cultures: A multimethod evaluation," by S. V. Paunonen, D. N. Jackson, J. Trzebinski, & F. Forster- 
ling, 1992, Journal of Personality and Social Psychology, 62, p. 452. Copyright 1992 by the Ameri- 
can Psychological Association. 


coefficient between the present factors and the Costa and McCrae Big Five was .83 
for our NPQ data and .92 for our PRF data, averaged across factors and countries. 
The slightly lower mean convergence for the nonverbal data notwithstanding (due, 
no doubt, to different behaviors represented in the nonverbal and verbal items), these 
results suggest that the Big Five factors do not depend on the explicit use of lan- 
guage in the measurement of personality. 


Replicated psychometric properties 


In another study (Paunonen ef al., 1996), we collected more cross-cultural data with 
the NPQ. Specifically, we administered the inventory to respondents in Canada, 
Finland, Poland, Germany, Russia, and Hong Kong. (For the four countries already 
referred to earlier in this chapter, new samples of subjects were obtained). Also ad- 
ministered to respondents was the verbal PRF questionnaire as а criterion measure. 
The goal of this study was to evaluate the replicabilities of the NPQ psychometric 
properties discovered in previous studies. Of particular interest was the inclusion in 
this study of the Chinese city-state of Hong Kong. 

The internal consistency reliabilities for the NPQ were, averaged across scales, 
75 for the Canadian sample, .67 for the four European samples, and .61 for the Chi- 
nese sample. A similar ordering of reliabilities was found for the PRF verbal scales. 
We speculated that these cultural differences in reliability could reflect some prob- 
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lems with the translation of the NPQ rating instructions and/or the PRF verbal items, 
particularly into Chinese. Alternatively, the cultural differences could reflect a less 
than optimal applicability of the traits or items to respondents in non-North Ameri- 
can cultures. 

Тће МРО scale validities also showed some cultural differences. The convergent 
validities were quite good only in the Canadian sample, averaging .51 across scales. 
Validities were more moderate in the four European samples, with a mean of .39, 
and relatively low in the Chinese sample, with a mean of .28. These cultural differ- 
ences could, again, indicate that the translations of the verbal criteria into languages 
other than English may have been suboptimal for some of the scales in some of the 
cultures. Or, the differences in validities might suggest that North American traits or 
their behavior-in-situation exemplars do not necessarily apply to other cultures, par- 
ticularly to a Chinese culture. We have more to say about these possibilities later in 
this chapter. 

To evaluate the factor structure of the NPQ, the 16 personality scales in each data 
set were intercorrelated and factored by the method of principal components. Eigen- 
value and scree criteria indicated between four and six factors across the four coun- 
tries. Five factors, however, seemed to provide an adequate and satisfactory solution 
in all cases. Thus, five factors were extracted and then rotated with a Procrustes 
transformation (Schónemann, 1966) to the Big Five target structure presumably un- 
derlying the PRF traits, as reported by Costa and McCrae (1988). The results of 
those orthogonal factor rotations showed very good convergences, in general, for 
corresponding NPQ dimensions across the four cultures. The mean congruence coef- 
ficient calculated across all culture-culture factor comparisons was .90. The compa- 
rable mean value for the PRF data was slightly lower at .83. 

Another cross-cultural investigation of the psychometric properties of the NPQ 
was recently conducted by Lee, Ashton, Hong, and Park (2000) with 221 university 
student respondents in South Korea. The internal-consistency reliabilities of the 
NPQ scales in that sample averaged .71. This mean value is noticeably higher than 
the mean reliability of .61 that was reported in another East Asian culture, Hong 
Kong (see above). Lee et al. factor-analyzed the NPQ scales, rotating the factors to 
a Big Five solution using an orthogonal Procrustes rotation. The resulting factors 
showed high and statistically significant congruence coefficients with the PRF Big 
Five factors identified by Costa and McCrae (1988), ranging from .79 to .94. 


Meta-analysis of NPQ factors 


In a recent study, we sought once again to evaluate the generalizability of the NPQ 
scales’ psychometric properties across different cultures (Paunonen, Zeidner, 
Engvik, Oosterveld, & Maliphant, 2000). In this case, however, we wanted to extend 
our analyses to new cultures not used in the past, including a Middle Eastern culture. 
Also, we decided to use a meta-analytic approach to resolving the factor structure of 
the NPQ scales, which is designed to produce a composite structure that best repre- 
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sents all of the countries assessed. To this end, we applied a meta-analytic technique 
for factor analysis proposed by Becker (1996). | 

Respondents in this study (Paunonen et al., 2000) were university students in 
Canada, England, the Netherlands, Norway, and Israel. All subjects were adminis- 
tered the nonverbal items of the NPQ and the verbal items of the PRF. The NPQ 
reliabilities were, on the whole, quite good considering that the scales consist of 
only eight items each. The mean internal consistency coefficients, averaged across 
the 16 NPQ scales, ranged from .72 for the English sample to .65 for the Israeli 
sample. The range of NPQ-PRF convergent validities, averaged across correspond- 
ing trait scales, was from .48 for the Norwegian data to .27 for the Israeli data. The 
relatively low mean validity for the Israeli sample is equivalent to the validity re- 
ported for the NPQ scales in the Chinese sample we described earlier (Paunonen et 
al., 1996). 

The NPQ scales next were intercorrelated and factored by the method of principal 
components. Not only did we analyze the data independently by country, but we 
also followed Becker's (1996) procedure for combining those data in one meta- 
analysis. Essentially, a weighted means procedure is used to aggregate the countries’ 
individual product-moment intercorrelations matrices into a single pooled matrix. 
The variable intercorrelations are then disattenuated for unreliability before factor- 
ing, so that the resultant structure is not affected by differential errors in the meas- 
urements. 

Five factors were extracted from the NPQ trait scores in the pooled data, ac- 
counting for 83.8 per cent of the scales' variance. The factors were then rotated to a 
target matrix using an orthogonal Procrustes procedure (Schónemann, 1966). The 
target chosen represented the factor structure matrix of the PRF traits presented by 
Costa and McCrae (1988) and interpreted by them as representing the Big Five. The 
results of the present meta-factor analysis for the combined NPQ data sets are shown 
in Table 3. 

The NPQ meta-factor solution shown in Table 3 is very clear with regard to the 
Big Five, and we think most will agree with our choice of factor labels. Only two 
traits loaded more highly on a factor other than that targeted for the variable — 
Dominance tended to load the Extraversion factor instead of its targeted Agreeable- 
ness factor, and Understanding tended to define Conscientiousness somewhat more 
than its intended Openness to Experience factor. Note that the NPQ version of Neu- 
roticism, being defined by Social Recognition and Succorance, is somewhat re- 
stricted to a dependence on others for acceptance and emotional support. This stands 
in contrast to the traditional version of Neuroticism, which is related more to anxiety 
and depressed mood. This is an issue we address again in the development of the 
FF-NPQ, described in the following sections. 

The results of this study (Paunonen et al., 2000) are largely consistent with other 
cross-cultural findings we have reported for the NPQ. Some of the implications of 
this consistency are that (a) the nonverbal personality scales have good reliability, 
criterion validity, and factorial validity; (b) the scales are applicable across many 
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Table 3. Procrustes rotated meta-factors of combined Nonverbal Personality Questionnaire (NPQ) 
data from 5 countries 


До ——c-——— o 

NPQ scale E A С N О 
Extraversion (E) 

Affiliation .74 .44 .12 .32 ‚23 

Exhibition .80 -.29 -.16 .13 Е ‚25 

Play .68 -.50 .03 .04 E 
Agreeableness (A) 

Nurturance .50 .61 E35 ‚21 oS 

Aggression .20 -.80 -.27 27 .03 

Dominance „65 -.34 ‚24 721 .39 
Conscientiousness (C) 

Achievement -23 -.19 .74 .26 .46 

Endurance 221 -.06 fil -.15 .56 

Order .03 .22 .70 .45 .02 

Impulsivity .47 -.42 -.54 ‚40 .26 
Neuroticism (N) 

Social recognition .40 -.47 .24 .64 .00 

Succorance .10 220 -.09 .83 .25 
Openness to Experience (О) 

Autonomy 12 .00 .08 -.09 :93 

Thrill-seeking .39 -.41 -.08 -.29 .66 

Sentience .18 .41 21174 .43 .65 

Understanding -.06 -.09 .61 .20 .56 


Note: Targeted loadings in italics. From "Тһе nonverbal assessment of personality in five cultures, " 
by S. V. Paunonen, M. Zeidner, H. Engvik, P. Oosterveld, & R. Maliphant, 2000, Journal of Cross- 
Cultural Psychology, 31, p. 232. Copyright 2000 by Western Washington University. 


cultures and can be used for cross-cultural personality assessments; and (c) the 
structure of personality has some generality across cultures and, furthermore, (d) that 
structure tends to support the well-known Five-Factor Model of personality. 


Construction of the FF-NPQ 


Although the NPQ provides psychometrically sound measures of a wide array of 
personality trait variables, we realized that in some settings a nonverbal measure of a 
few broad personality dimensions might be desired. To provide such a measure, we 
decided to construct a new questionnaire that would assess personality variation 
only at the broad level of abstraction represented by the Big Five factors. That new 
questionnaire is known as the Five-Factor Nonverbal Personality Questionnaire (FF- 
NPQ; Paunonen et al., 2001). 
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Overview.of the FF-NPQ 


The FF-NPQ contains 60 nonverbal items that measure, with 12 items each, the Big 
Five personality factors of Extraversion, Agreeableness, Conscientiousness, Neu- 
roticism, and Openness to Experience. Those items, with a few exceptions to be dis- 
cussed below, were selected from the longer 136-item NPQ to represent each of the 
Big Five factor domains. In completing the FF-NPQ, essentially the same rating 
instructions are given respondents as those shown in Figure 1 for the NPQ. Because 
of its short length of 60 items, most people finish the five-factor inventory in about 
10 minutes. 


Item selection strategy 


Our first task in constructing the FF-NPQ (see Paunonen ег al., 2001) was to decide 
which NPQ trait scales should contribute to the item pool for each nonverbal Big 
Five factor measure. We initially considered the possibility of assigning traits to 
factors based on the NPQ meta-factor analysis reported in Table 3. We noticed, 
however, that those factors might represent somewhat unorthodox rotations (from 
the point of view of the conventional Five-Factor Model) of some of the Big Five 
axes. For example, the NPQ Understanding scale, which might normally be viewed 
as a univocal marker of Openness to Experience, loaded slightly more highly on 
Conscientiousness than on Openness. We decided, therefore, to verify the relevance 
of each NPQ trait scale to the traditional Big Five domains by correlating those 
scales with known markers of the Big Five structure; specifically, we correlated the 
NPQ scales with the scales of the NEO Five-Factor Inventory (NEO-FFI; Costa & 
McCrae, 1992). 

Correlations between the NPQ and NEO-FFI scales were calculated on a sample 
of 304 Canadian university student respondents (112 men, 192 women). Using a .30 
correlation as a threshold for assigning an NPQ scale to a Big Five domain, we then 
made the following trait-factor assignments: Affiliation, Dominance. and Exhibition 
were assigned to Extraversion; Nurturance and (low) Aggression were assigned to 
Agreeableness; Achievement, Endurance, (low) Impulsivity, and Order were as- 
signed to Conscientiousness; Succorance alone was assigned to Neuroticism; and 
Autonomy, Sentience, and Understanding were assigned to Openness to Experience. 
Three NPQ scales, Thrill-Seeking, Social Recognition, and Play, did not correlate 
.30 or above with any NEO-FFI scale, and were thus excluded from the FF-NPQ 
item selection process. 

For each Big Five domain, we wished to create a new, 12-item, nonverbal factor 
scale by selecting items from the relevant set of 8-item NPQ trait scales. However, 
in the case of the Neuroticism domain, the only eligible NPQ scale was Succorance, 
so the eight items of that scale, and no others, were automatically included in the 
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Table 4. Means and standard deviations for FF-NPQ scales in a Canadian sample 


Full sample Men . Women 
(М = 304) (М = 112) N = 192 
FF-NPQ scale Mean SD Mean SD Mean SD 
Extraversion 48.9 11.9 48.9 12.2 48.9 (ДИЈЕ 
Agreeableness 64.4 Шз 58.3 11.3 67.9 9.7 
Conscientiousness 56.4 10.7 53.0 10.5 58.3 10.4 
Neuroticism 3378 Jo 3450 И 34.7 7055) 
Openness to Experience 57:3 Mute 54.1 10.4 59.1 ilz 


Note: All item responses recorded on 7-point rating scales; Each FF-NPQ scale has 12 items, except 
Neuroticism with 8; FF-NPQ - Five-Factor Nonverbal Personality Questionnaire. From "Nonverbal 
assessment of the Big Five personality factors," by S. V. Paunonen, M. C. Ashton, & D. N. Jackson, 
2001, European Journal of Personality, 15, p. 8. Copyright 2001 by John Wiley & Sons. 


FF-NPQ Neuroticism scale. For the remaining four Big Five domains, we pooled the 
items from the relevant NPQ trait scales, and then calculated corrected item-total 
correlations for each of the preliminary Big Five measures, retaining the best items. 
Besides considering item-total correlations in our item selection procedure, we de- 
leted any item that showed a higher correlation with the total score on another Big 
Five domain, and also excluded items that depicted behaviors whose cross-cultural 
generality seemed to be less than desirable. 

The final composition of each FF-NPQ factor scale, in terms of the constituent 
NPQ trait scales and the numbers of items selected therefrom (in parentheses), was 
as follows. Extraversion: Affiliation (3), Dominance (3), Exhibition (6); Agreeable- 
ness: Nurturance (5), Aggression (7, reverse-keyed); Conscientiousness: Achieve- 
ment (4), Endurance (3), Order (5); Neuroticism: Succorance (8); Openness to Expe- 
rience: Autonomy (3), Sentience (4), Understanding (5). Note that the version of the 
FF-NPQ which we evaluated in this study (Paunonen et al., 2001) had only 56 
items in total. Since that study, the Neuroticism scale has been revised and length- 
ened from 8 to 12 items, a procedure we describe later in this chapter. 


Psychometric properties 


We evaluated the psychometric properties of the 56-item FF-NPQ inventory within 
the derivation sample of 304 Canadian respondents described above. First, we con- 
sidered the scale means and standard deviations, which are shown in Table 4. These 
values indicate that the scale means were relatively close to the hypothetical scale 
midpoints (32 for Neuroticism, 48 for all other scales), and that the standard devia- 
tions were close to one-sixth of the possible range of scores (56 - 8 = 48 for Neuroti- 
cism, 84 - 12 = 72 for all other scales). Women averaged about a standard deviation 
higher than men on Agreeableness, and about half of a standard deviation higher 
than men on Conscientiousness, Neuroticism, and Openness to Experience. Women 
and men showed approximately equal mean scores on Extraversion. 

Table 5 summarizes some of the other psychometric properties of the FF-NPQ 
factor scales in our sample. The internal-consistency reliabilities of the scales (Table 
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Table 5. Psychometric properties of FF-NPQ scales in a Canadian sample 
одма ee eee EN. — c ue _ SS o 


Internal Correlation Self-peer 
а consistency with NEO-FFI correlation 
FF-NPQ scale (М = 304) (М = 304) (№ = 96) 
Extraversion .81 258 . ‚45 
Agreeableness .82 .59 .40 
Conscientiousness ‚79 С .50 41 
Neuroticism 15 .45 .39 
Ореппеѕѕ їо Ехрегіепсе .82 255 | .38 


Note: FF-NPQ = Five-Factor Nonverbal Personality Questionnaire; NEO-FFI = NEO Five-Factor Inven- 
tory. 


5, column 1) were quite good, ranging from .75 to .82, and averaging .80; even the 
eight-item Neuroticism scale had a respectable reliability. Correlations between the 
FF-NPQ scales and their NEO-FFI counterparts (Table 5, column 2) revealed fairly 
high levels of convergent validity for the FF-NPQ. Those correlations ranged from 
.45 to .59, and the average convergence of .52 is not much different from values 
normally obtained between different sets of verbal markers of the Big Five (e.g., 
Costa & McCrae, 1992, p. 54). In contrast, none of the discriminant correlations 
between FF-NPQ and NEO-FFI scales exceeded .25, and the mean absolute value 
was an appropriately low .14 (see Paunonen et al., 2001). 

A subset of 96 students in the sample of 304 respondents completed the FF-NPQ 
in both a self-report and peer report format. We were thus able to correlate each of 
these 96 participants' self-report scores on the FF-NPQ scales with peer report 
scores on the same scales. These calculations revealed that the five convergent cor- 
relations between self and peer, shown in column 3 of Table 5, ranged from .38 to 
.45, with a mean of .41. This value compares favorably with the .43 correlation be- 
tween self- and peer report scores on the much longer. 48-item NEO-PI-R domain 
scales (Costa & McCrae, 1992, p. 50). The self-peer discriminant correlations for the 
FF-NPQ scales were quite low, with an average value of only .08, and with по cor- 
relation exceeding .16. 

Most respondents in the FF-NPQ validation sample also completed the Behavior 
Report Form (Paunonen, 1993, 1998; Paunonen & Ashton, 2001b), a self-report 
measure designed to assess criteria of some social significance, such as traffic viola- 
tions and alcohol consumption, that might have underlying personality determinants. 
Across 14 such criteria, each assessed only as a single item, the mean multiple cor- 
relation yielded by the five FF-NPQ factor scales as predictors was .25, which ex- 
actly equaled the corresponding value obtained by the five NEO-FFI factor scales as 
predictors (see Paunonen et al., 2001, Table 6). Thus, the estimated external crite- 
rion validity of the new nonverbal FF-NPQ measures of the Big Five was equal to 
that of established verbal measures of the same constructs. 
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Table 6. Psychometric properties of FF-NPQ scales in international samples 


FF-NPQ scale Internal consistency Correlation with PRF 
Extraversion 79 .54 
Agreeableness .72 .50 
Conscientiousness .71 .50 
Neuroticism .64 235 


Openness to Experience 277 .50 


Note: Each coefficient is the average of seven countries' results (N - 90-113); FF-NPQ = Five-Factor 
Nonverbal Personality Questionnaire. PRF - Personality Research Form. 


Cross-cultural evaluations 


Given that the NPQ has shown good psychometric properties in various cultures, the 
prospects for the cross-cultural applicability of the FF-NPQ seems promising. We 
conducted an initial investigation of the FF-NPQ across cultures (Paunonen et al., 
2001) using archival data from samples of respondents already described in this 
chapter. They included 701 university students (447 women, 254 men) in Canada, 
England, Finland, Germany, Norway. Poland, and Russia. All of the participants 
completed the nonverbal NPQ scales, from which FF-NPQ scores could be calcu- 
lated, in addition to the verbal PRF scales (some of the latter were in an abbreviated 
form). 

Our results showed that the means and standard deviations for men and women 
on the FF-NPQ scales for the international samples were close to the values reported 
for the normative Canadian sample. Furthermore, as shown in column 1 of Table 6, 
internal-consistency reliabilities, averaged across countries, were quite satisfactory, 
ranging from .64 to .77 for the five scales. The mean internal-consistency reliabil- 
ities, averaged across the five scales, varied slightly by country, ranging from .66 in 
Finland to .79 in England. The FF-NPQ scale intercorrelations were relatively small, 
with no correlations exceeding .39, and with a mean absolute correlation of .22; 
these values tended to be similar across the individual countries in this data set. 

We calculated correlations between the FF-NPQ nonverbal measures of the Big 
Five and verbal measures of the Big Five that were constructed from the PRF scales. 
To construct each PRF-based Big Five measure, we simply summed the PRF scales 
whose NPQ counterparts had provided items for the corresponding FF-NPQ factor 
measure (as described in an earlier section). Convergent correlations between the 
PRF and FF-NPQ measures of the Big Five, shown in column 2 of Table 6, ranged 
from .35 to .54, with an average of .48. The average convergent correlation differed 
a bit by country, ranging from .40 in Finland to .55 in Norway and in England. We 
consider these values to be quite satisfactory considering (a) the relatively short 
length of the nonverbal factor scales, and (b) the fact that the verbal and nonverbal 
scales undoubtedly measure somewhat different aspects of the same personality con- 
structs. The discriminant correlations between PRF and FF-NPQ scales were gener- 
ally small in the international data, with none exceeding .26, having a mean absolute 
correlation of .10, and being similar in size across countries. 
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Postscript 


The psychometric performance of the FF-NPQ, as evaluated in Canadian and Euro- 
pean data, appears to be quite good in all respects (Paunonen er al., 2001). The поп- 
verbal scales have appropriate means and standard deviations, good reliabilities, 
acceptable levels of convergent and discriminant validity, and are predictive of spe- 
cific behavior criteria. It was clear in some of those results, however, that the weak- 
est FF-NPQ scale of the five was Neuroticism, which generally showed slightly 
lower reliabilities and convergent correlations than did the other four factor scales. 
This was probably due to the brevity of the Neuroticism scale (8 items, versus 12 
items for the other scales) and, in the case of the convergent correlations with the 
NEO-FFI variables, perhaps due also to the limited range of content in this scale 
(1.е., being related only to the need for succorance). 

To improve the psychometric quality of the FF-NPQ Neuroticism scale, we have 
since commissioned the drawing of new nonverbal items, and conducted item trials 
based on a new respondent sample. Specifically, 13 new items were drawn to repre- 
sent a few different aspects of behavior related to nonclinical neuroticism (e.g., pho- 
bias, depression, etc.). Those items were then administered to 178 Canadian univer- 
sity students, along with the other 56 FF-NPQ items and the NEO-FFI. 

Our intent was to select the 12 best Neuroticism items from the set of 13 new 
items plus 8 existing items. To this end, we factored the 21 items to determine if 
there was any tendency of the items to cluster into correlated facets of Neuroticism. 
We found that 14 of the items strongly defined four factors, with three or four items 
on each factor. Those four factors were clearly related to phobic behavior, paranoia, 
depression, and the need for succorance, respectively. Therefore, we decided to se- 
lect the best three items for each of these four facets to be included in our revised 
12-item Neuroticism factor scale. 

When scored as separate three-item scales, the four Neuroticism facet scales all 
intercorrelated positively, ranging from .31 to .44 with the mean of .36. When all 12 
items were combined into an overall Neuroticism factor scale, its reliability in our 
sample of respondents was .81. Furthermore, the factor scale's correlation with the 
NEO-FFI Neuroticism scale was .57. Our revised 12-item Neuroticism scale, there- 
fore, now has psychometric properties very similar to those of the other 12-item FF- 
NPQ scales (see Table 5 and Table 6). 


Directions for future research 


In this chapter, we have described the conceptualization and development of two 
new and novel questionnaires designed to measure the traits and the factors of per- 
sonality. The questionnaires are the NPQ and FF-NPQ, and what makes them novel 
is the fact that their items do not contain verbal content, and yet they are structured 
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in the sense that respondents must choose their item responses from a list of alterna- 
tives. The empirical data we reviewed support the construct validity of the new non- 
verbal scales. In general, they have shown good levels of reliability, criterion valid- 
ity, and factorial validity, both in North American and non-North American respon- 
dent samples. 

As we mentioned in the Introduction, a nonverbal measure of personality has 
utility for certain applications. In particular, cross-cultural studies and the assess- 
ment of different linguistic groups come to mind. But our enthusiasm for the NPQ 
and FF-NPQ questionnaires notwithstanding, we feel it is necessary to delineate 
some important limitations in the use of those measures. These limitations pertain to 
(a) the culture-freeness of the nonverbal items, (b) the use of imported or foreign 
measures of personality in different cultures, and (c) the universality and 
comprehensiveness of the Five-Factor Model of personality structure. 


On culture-free personality items 


The data presented in this chapter, which largely support cross-cultural applications 
of the NPQ and FF-NPQ, should not be interpreted as a claim for the culture- 
freeness of the nonverbal items. The nonverbal items certainly are not culture-free, 
and they may not even be culture-reduced. In fact, many of those items have culture- 
specific referents. Furthermore, some of the items clearly are most relevant to West- 
ern, educated, middle-class respondents. One item, for example, shows a person 
cooking a meal for fnends on a manufactured outdoor grill, and another item shows 
someone enjoying a modern art exhibit in a museum. This content specificity may 
explain why some of the NPQ scales showed lower levels of reliability and validity 
in some non-Western samples (Paunonen et al., 1996, 2000). 

Some of the nonverbal items are clearly not likely to be endorsed by certain 
groups. And the reasons might not just be cultural, but could include political, geo- 
graphic, or economic factors as well. For example, even very aggressive people in 
some totalitarian countries might be unwilling to endorse the NPQ Aggression item 
showing a person yelling at a police officer who is writing a traffic ticket. Highly 
sentient respondents living in a desert environment might, despite their level of that 
need, tend to rate as an improbable behavior the Sentience item showing a person 
lying on a beach at the seaside. And, poor people everywhere are not likely to en- 
dorse the Social Recognition item showing a person buying an expensive watch. 

We do not intend to portray the NPQ or FF-NPQ as completely culture-free. A 
truly culture-free personality instrument might resemble something like the Ror- 
schach inkblot test. With such unstructured measures, a respondent's interpretation 
of an ambiguous image is thought to be symptomatic of some latent personality trait. 
But even such abstract nonverbal items have had problems in proving their culture- 
fairness (see Frijda & Jahoda, 1966). Nonetheless, it is our belief that the NPQ and 
FF-NPQ personality measures can be useful for research and assessment in many 


different cultural contexts. 
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On imported measures of personality 


People have long been interested in the issue of the consistency of human behavior 
across cultures. Different questions related to this issue have been asked (see Pau- 
nonen, 2000). Do people engage in the same behaviors in different cultures? If those 
behaviors represent latent personality traits, are those traits themselves consistent 
across cultures? And, if such traits are organized in a structured personality network, 
is that organization a universal one? In answering questions like these, researchers 
will often export their favorite measure of personality for use in other (i.e., foreign) 
cultures. This is what we did when we used the NPQ and ЕР-МРО in the cross- 
cultural studies described in this chapter. 

The present cross-cultural data were remarkably consistent in the studies we have 
reviewed here. In another article (Paunonen & Ashton, 1998), we have argued that 
such consistency supports the generality of personality traits and trait behaviors 
across cultures. In other words, it would be hard to argue that the traits or behaviors 
are not relevant to a culture if the imported measures show good levels of reliability, 
criterion validity, and factorial validity in that culture. But even if the psychometric 
properties of the imported measures were substantially different across cultures, that 
would not necessarily mean that the constructs are not relevant to those cultures. 

What if one were to find that the same personality instrument administered in two 
different cultures yielded different results? Does that mean that the personality vari- 
ables assessed using the imported measure are not relevant to one or both of those 
cultures? Not necessarily. There are many reasons why a personality inventory will 
yield different psychometric properties in different cultures. These reasons include 
item translation problems (not relevant, of course, to the МРО or ЕР-МРО), response 
style involvement, test format issues, criterion-related difficulties, and more (see 
Paunonen & Ashton, 1998). But even if such methodological reasons can be dis- 
counted as the cause of the psychometric differences, can it be claimed that the con- 
structs measured are not relevant to the cultures assessed and, therefore, that the use 
of an imported measure is at fault? 

It is generally believed that a personality measure can fail in a culture because (a) 
the items used are not relevant to the people in the culture. or (b) the trait itself is not 
relevant to that culture. Regarding the first point, there is no question that the be- 
havior exemplars of a trait vary from one culture to another and, ideally, different 
versions of the test should reflect that variation. The second point, the notion that the 
personality trait itself is nonexistent in a culture, is a much more elusive idea. If a 
trait exists in one culture but not in another it must mean that, in the latter culture, no 
postulated trait-relevant behaviors occur. 

It is our view that, although trait exemplars might very well be different in differ- 
ent cultures, most (if not all) of the postulated latent traits of personality exist in 
those cultures, and could be found with a proper program of construct validation 
(see Paunonen, 2000). However, the items in an imported personality measure, in- 
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cluding the NPQ and the FF-NPQ, might not be relevant to the respondents in a par- 
ticular culture, making it appear as if some of the personality constructs are at fault. 

We agree, in general, with those who caution against the thoughtless adoption of 
foreign measurement devices in cross-cultural research. Our reasons, however, have 
more to do with problems concerning the relevance of the test items to cultures, 
rather than with problems concerning the relevance of the personality constructs 
underlying those items. It is our suspicion that well-developed personality measures, 
measures that have demonstrable levels of construct validity across diverse cultures, 
will reveal that there are few personality traits that are not general to those cultures. 
Thus, we maintain that the constructs measured by the NPQ and FF-NPQ are rele- 
vant to most, if not all, cultures, even though some of their items might not be. 


On the Five-Factor Model of personality structure 


As its name implies, the Five-Factor Nonverbal Personality Questionnaire was de- 
veloped in accordance with the Five-Factor Model of personality structure. The di- 
mensions of personality measured by ЕР-МРО, the Big Five, are ubiquitous, having 
been identified in many disparate cultures, using many different personality assess- 
ment instruments, by many independent researchers. Moreover, the data reviewed in 
this chapter support our claim that the FF-NPQ scales do a reasonable job at meas- 
uring those personality dimensions. This being said, our present test construction 
efforts should not be interpreted as suggesting that measurement of the Big Five 
factors alone is sufficient for adequate personality assessment. 

We have reported, in several studies (Ashton, 1998; Ashton, Jackson, Paunonen, 
Helmes, & Rothstein, 1995; Paunonen, 1998; Paunonen & Ashton, 2001a; Paunonen 
& Ashton, 2001b; Paunonen & Nicol, 2001; Paunonen, Rothstein, & Jackson, 1999), 
the fact that the Big Five factors of personality do a reasonably good job of predict- 
ing criterion behaviors of some social significance (e.g., smoking behavior, alcohol 
consumption, grade point average). However, those studies have also shown that one 
can do better if one also considers behavior variation that is not part of any Big Five 
factor. 

Where does one find behavior variation independent of the Big Five factors? In at 
least two places. First, there are traits and dimensions of personality that lie largely 
beyond the Big Five factors' sphere of influence. For instance, there is the sixth big 
factor of personality reported by Ashton, Lee, and Son (2000), and the 10 or so 
lower level traits described by Paunonen and Jackson (2000), none of which fit well 
within the factor space of the Big Five. Second, we can find such variation in the 
traits that are facets of the Big Five factors, but with Big Five variance partialed out. 
That is, a Big Five factor's constituent traits invariably contain significant amounts 
of trait-specific variance. And our empirical results clearly indicate that that specific 
variance, which by definition is independent of the traits’ common variance, is often 
predictive of some criterion variables of interest, sometimes adding substantially to 
the accuracy of prediction equations based on Big Five factors alone (e.g., see Pau- 
nonen & Ashton, 2001b). 
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In summary, by considering only the Big Five factors as predictors of behavior, 
one sacrifices optimal levels of behavior predictability. As a consequence, one also 
sacrifices a commensurate amount of behavior understanding. Our recommendation 
is that, when the assessment circumstances will allow, measurement of the Big Five 
factors of personality should be supplemented with an assessment of the individual 
Big Five facet variables. 
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The preparation of this chapter was supported by the Social Sciences and Humani- 
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Chapter 9 


The Global Personality Inventory (GPI) 


Mark J. Schmit 
Jenifer A. Kihm 
Chet Robie 


Introduction 


The Global Personality Inventory (GPI) is a measurement tool specifically devel- 
oped for work-related use by psychologists working in or with organizations. It was 
designed for applications such as pre-employment selection, developmental assess- 
ment, coaching, and succession management. It was originally developed for inter- 
nal use at an international consulting firm, Personnel Decisions International (PDI). 
As part of a recent merger/acquisition deal, the instrument became the property of 
ePredix, Inc., though РОГ 15 still the primary user of the tool, using it in assessment 
center work around the world. It will soon be widely available for purchase and use 
by practitioners; academic uses are currently permitted and encouraged (contact 
nigel.dalton G epredix.com). 

The development plan of the GPI included two major ways in which "global" 
would apply to the final instrument. First, it was designed to be an omnibus measure 
of personality based on the Big-Five factor structure. Second, the GPI was devel- 
oped using an atypical approach that involved psychologists from around the world 
contributing to the construct development, item writing, and pilot testing. Statistical 
methods were used to create a final version of the test that showed scale invariance 
across-cultures. In these ways, the GPI is truly a global measurement tool. The 
original development of the GPI was reported in Schmit, Kihm, and Robie (2000). 


1 A large portion of this paper is reprinted from Schmit MJ, Kihm JA, Robie C. (2000). Development of 
a Global Measure of Personality. Personnel Psychology, 53(1), 153-193. Reprinted with permission from 
Personnel Psychology. 


Big Five Assessment, edited by B. De Raad & M. Perugini. © 2002, Hogrefe & Huber Publishers. 
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The approach to measurement 


Personality tests have typically been developed in a single country and are then 
transported to other countries, an approach known as an imposed-etic strategy 
(Berry, 1969). The best measures have very good psychometric evidence to support 
the use and interpretation of the measures for many different applications in the 
home country of the original development studies. Once a measure has proven use- 
ful in its home country, an effort often begins to translate the measure for use in 
other countries. Many personality tests have been developed in the United States and 
were then transported through translated measures to other countries in this manner 
(e.g., MMPI, CPI, 16PF). There is evidence that this approach has been relatively 
successful in providing measurement instruments that demonstrate similar psycho- 
metric properties across cultures. 

A good example is provided by Costa and McCrae's NEO-PI-R, a measure based 
on the Five-Factor Model of personality. Recent evidence provided by these re- 
searchers suggests that the NEO-PI-R may be useful in differentiating individual 
differences in personality across cultures (see McCrae & Costa, 1997, for a review). 
The instrument has been found to show similar psychometric properties in several 
countries around the world (e.g., China, Korea, Russia, Germany, Holland, Israel, 
Philippines, Japan, and Portugal). Several other measures have enjoyed similar suc- 
cess (e.g., California Psychological Inventory (СРП; Sixteen Personality Factor 
Questionnaire (16PF); Occupational Personality Questionnaire (ОРО); Minnesota 
Multiphasic Personality Inventory (MMPI); see Katigbak, Church, & Akamine, 
1996, for a review). 

Despite the apparent transportability of a measure across cultural lines, there are 
some caveats to this approach. First, items developed in a particular country might 
better represent the construct being measured than items written in a different coun- 
try that were translated to the language of the import country. That 15, personality 
may not be different across cultures, but expressions of personality are highly likely 
to differ (Church & Katigbak, 1988). Second, exported instruments are often 
changed significantly as they are transported from country to country, leaving one 
uncertain about the comparability of measures being used across different language 
versions (Hambleton & Kanjee, 1995). Finally, many instruments developed for 
clinical use have limited coverage and adaptability to work done by applied psy- 
chologists in work contexts. Thus, we designed a strategy for developing a measure 
of personality to be used by applied psychologists world-wide that involved psy- 
chologists from many cultures in all phases of the development process, including 
construct identification and definition, item development, instrument translation, 
data collection, construct equivalence studies, and validation. 

The design of our development effort was fashioned after models that have at- 
tempted to combine both etic and emic approaches to develop a derived etic measure 
(Berry, 1989; Davidson, Jaccard, Triandis, Morales, & Diaz-Guerrero, 1976; Тпап- 
dis, Malpass, & Davidson, 1971). An emic approach is taken from within a culture, 
while an etic approach examines many cultures from a perspective outside the cul- 


' The Global Personality Inventory 197 


tural systems (Berry, 1969; Pike, 1966). Combining the approaches in test develop- 
ment has been shown to lead to the most useful common, or universal, measure 
(Berry, 1989; Church & Katigbak, 1988). 

In the development of our new cross-cultural measure of personality, we also at- 
tempted to use all of the methods that have been designated as best practices in such 
an effort. Accordingly, we used the /nternational Guidelines for Adapting Educa- 
tional and Psychological Tests (Hambleton, 1994; Hambleton & Kanjee, 1995) as 
benchmarks for each step of our development process. The objective was to develop 
a measure with a cleaner factor structure, better reliability, less differential item and 
test functioning, and ultimately better validity for the inferences made from the 
measure when used by applied psychologists than tests developed and transported 
by less rigorous methods. 

There are several reasons why a common measure of personality for use in multi- 
ple countries might be highly desirable. An international industrial-organizational 
psychology consulting firm (such as Personnel Decisions International; PDI) that 
uses personality measures for the assessment of employees in client organizations is 
an example of an organization that would seek a common measure to be used in its 
offices around the world. This measure could be used for selection, development, 
coaching, and feedback purposes. For a firm such as this, a common measure with 
strong psychometric properties across cultures would allow the firm to establish 
norms at local, country, continent, and global levels. Having these types of norms 
could facilitate cross-cultural assignment and development of executives. Common 
administration systems and software could be developed around such a measure. 
Many other administrative issues could be resolved with a common measure. 

Most important, however, is the potential gain in the ability to do cross-cultural 
research and comparisons of individuals. Currently, applied psychologists typically 
use transported instruments that have been modified in multiple ways across multi- 
ple instruments. Thus, cross-cultural research and cross-cultural applications are 
limited. When cross-cultural research is conducted with instruments demonstrating 
varying degrees of moderate to low psychometric quality, including differential item 
and test functioning across cultures, mean differences between cultures, or infer- 
ences made about differences or variations of scores of individuals from different 
cultures are tenuous at best and usually are not interpretable at all. 

In the remainder of this chapter, the development of a global personality inven- 
tory (i.e., the Global Personality Inventory; GPI) will be described. This instrument 
is intended to be a measure of personality that will prove useful in the practices of 
one particular world-wide provider of industrial-organizational psychological appli- 
cations and services. However, the process we undertook and the methods we em- 
ployed in developing this instrument should prove useful as a model in developing 
any instrument that measures à common set of constructs that can be employed in 
many different locations throughout the world. 
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Instrument development 


An international group of GPI development teams was assembled. These teams in- 
cluded consultants and researchers affiliated with the above-mentioned international 
consulting firm from around the world, in addition to external researchers from vari- 
ous universities (see acknowledgements). The roles of external, academic research- 
ers were defined to include providing content and research expertise, providing ac- 
cess to research samples, making methodological recommendations, and partnering 
with consultant team members on presentations and papers to be delivered at profes- 
sional conferences and published in academic journals. Ten teams were assembled 
with a total of 70 team members, most of whom were Ph.D. or Masters level psy- 
chologists. All of the consulting firm's current global offices had representatives on 
these teams. Psychologists on the teams were from the USA, UK, France, Belgium, 
Sweden, Germany, Spain, Netherlands, China, Japan, Singapore, Korea, Argentina, 
and Columbia. 

A personality measure to be used worldwide should be well grounded in person- 
ality theory. In addition, we believed that a theory of job performance should be kept 
in mind during the development, as the intended application of the instrument was to 
be focused on the work life of respondents. This stage of development was driven by 
theory, previous research, and rational thought, rather than by sheer empirical data. 
The outcome of this development phase was to be a conceptual model including the 
constructs to be measured by the GPI. In addition, construct definitions were to be 
developed in this stage which would serve as the basis for item writing. 


Selection of broad models to structure the GPI 


Personality model 


The personality model of choice today, among both personality and industrial- 
organizational researchers, is the Five-Factor Model of personality. The Five Factors 
include Extroversion, Agreeableness, Conscientiousness, Emotional Stability, and 
Openness to Experience. Some researchers have suggested that the model is both too 
broad and not fully representative of human personality (e.g., Hough, 1998). This 
may or may not be true, but this does not limit its usefulness as an organizing taxon- 
omy for conceptual variables at a level below the Five Factors. There is much 
greater agreement among researchers that applied measurement of personality can 
be very useful at a level below the Five Factors (Paunonen, 1998). At this level, fac- 
ets of the larger Five Factors can be thought of as being both primarily and secon- 
darily related to higher level variables. That is, a facet that is primarily related to 
Extroversion may be secondarily related to Conscientiousness. Desire for Achieve- 
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ment/Advancement is a good example of this type of facet because it has been found 
to load on both the Extroversion and Conscientiousness factors (Hough, 1998). 
Linking measures of GPI facets to the Five Factor model was thought to be a pru- 
dent method for developing construct validity evidence for the instrument. Addi- 
tionally, having a broad factor structure would allow for scales to be combined into 
broad predictor composites, as broad predictor composites have been found to be 
very useful in predicting broad criteria, such an overall performance (Ones & 
Viswesvaren, 1996). 

Another reason for using the Five Factor Model as an organizing taxonomy is the 
large base of research that exists for this model. Costa and McCrae, of the National 
Institute on Aging. have shown the model to fit almost all of the major personality 
inventories used today. In addition, recent evidence provided by these researchers 
suggests that the Five Factor model is invariant across cultures (McCrae & Costa, 
1997). The model has been found to hold in several countries with very diverse cul- 
tural differences around the world (e.g., United States, China, Korea, Russia, Ger- 
many, Holland, Israel, Philippines, Japan, and Portugal). 


Job performance model 


The GPI will be used primarily in the context of work, and therefore, we wanted to 
keep in mind а model of job performance throughout the development process. 
Years of research at PDI have resulted in several variations of a model of job per- 
formance (cf. Davis, Hellervik, Skube, Geblein, & Sheard, 1996). The core perform- 
ance factors of this model are highly consistent with current research suggesting 
core job performance elements for most jobs (e.g., Campbell, Gasser, & Oswald, 
1996; Campbell, McCloy, Oppler, & Sager, 1993). Thus, the core performance fac- 
tors of these models were used as an organizing structure for how the personality 
constructs relate to work behaviors. The development teams used the core model as 
a starting point and made slight modifications through consensus to arrive at the 
final performance model. The core performance factors included: Administrative, 
Thinking, Interpersonal, Leadership, Work Orientation, Self-Management, and Mo- 
tivation. 


Conceptual development of facet constructs for the СР! 


Conceptual development of facet scales 


A literature review was conducted, gathering information on all facet scales that 
have been linked to the Five Factor model. Particular attention was paid to those 
scales previously used in a work context. During the literature review we also sought 
to identify any international papers describing the conceptual and empirical linkages 
of facets to the Big Five. A review of the major personality inventories used today in 
business and industry was also conducted (e.g., California Psychological Inventory 
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(CPI), Sixteen Personality Factor Questionnaire (16PF), Occupational Personality 
Questionnaire (OPQ), Myers-Briggs Type Indicator (MBTI), Hogan Personality 
Inventory (HPI), NEO-Personality Inventory (NEO-PI)). From these reviews, a list 
of potential facet constructs was developed and conceptually defined. An initial set 
of 29 facets and definitions were created. 


World-wide conceptual input 


The organizing models (i.e., personality and performance), construct list, and defi- 
nitions developed from the literature search were then sent to our global teams for 
review. They were instructed to search their own local literature for alternative mod- 
els, facet to Big Five links, and construct definitions. The team members were also 
asked to use their own experience as individual assessors to identify potential per- 
sonality constructs or syndromes that they have found to be empirically or clinically 
tied to work successes and failures. They were referred to the job performance 
model as a starting point for thinking about predictor constructs. Input on both the 
Five Factor personality model and the performance model used as the organizing 
structures was also sought. Finally, they were asked to critique the list of 29 previ- 
ously developed facets and definitions for construct coverage, cultural appropriate- 
ness, and consistency with the literature. 

A global consensus was quickly reached that both the Five Factor model of per- 
sonality and the broad performance model were appropriate for cross-cultural use. 
The only modification was made to the performance model where the Work Orien- 
tation factor was divided into two orientations, Collective and Individual Orienta- 
tion, consistent with cultural differences. This distinction followed the Hofstede 
(1980) conception of these differences. Thus, the distinction of collectivism versus 
individualism refers to the dependence on others versus the independence from oth- 
ers. 

Several rounds of input were required to reach consensus on constructs at the 
facet level. For each round of input, two subject matter experts received the devel- 
opment team feedback and incorporated it into the definitions and models. Devel- 
opment teams were then asked whether the changes were appropriate for the con- 
struct as defined in their own cultures. Consensus was sought after each round of 
revisions before moving on to the next round. 

In the first round of construct input, each of the ten teams provided detailed cri- 
tiques and suggestions for improving the initial set of facets. This input was used to 
build a second set of facet constructs theoretically linked to both the Big Five model 
of personality and a PDI job performance model. This set of facet constructs and 
definitions was then submitted to the teams for further review. This second round of 
reviews produced substantially less input, yet none-the-less important. A major ad- 
dition that was suggested in the second round of reviews was the incorporation of 
management failure or derailing constructs. The description of these constructs sug- 
gested that the constructs would differ in two significant ways from the previously 
proposed constructs. First, it was clear that these constructs were not pure trait con- 
structs (i.e., not uni-dimensional), but were instead composites of trait constructs 
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(i.e., multi-dimensional) representing syndromes. And second, they were constructs 
that represented underlying mechanisms that would trigger behaviors outside the 
realm of behaviors considered "normal" in the work place. In other words, these 
were constructs that could lead to, or are associated with, dysfunctional work be- 
haviors. Thus, it was determined that the items to be used in measuring these con- 
structs would be more extreme in nature than those used in traditional measures of 
normal adult personality. 

Two additional constructs, Impressing and Self-awareness/Self-insight, were 
added in the third round of reviews that represent similar trait composites. The Im- 
pressing scale was added as a measure similar to social desirability measures. It was 
defined as a syndrome where individuals who possess high levels of this character- 
istic are likely to try to impress others in many situations, including the testing 
situation. Thus, it was thought to be a multi-dimensional measure of substance, not 
just responding style (cf. McCrae & Costa, 1983). The second multi-dimensional 
construct of Self-awareness/Selt-insight was defined consistent with Buss' concep- 
tion of this set of characteristics that he felt clearly went beyond the scope of the 
Five Factor model of personality (Buss & Finn, 1987). 

In summary, the second round of global construct reviews resulted in a set of 33 
revised and new facet constructs at the trait level and four management failure con- 
structs. This set of constructs was then submitted once again to the GPI development 
teams. Input from the third round of reviews led to a final set of 32 facet constructs 
linked to the Five Factor model, five management failure constructs, and two addi- 
tional compound trait composites (see Table 1 for the final set of constructs). This 
set was submitted back to the teams and final approval of the conceptual model was 
reached by consensus. 


Item development for construct scales 


The goal in the development of a personality inventory for PDI to use worldwide 
was to create a set of scales with common items that will be useful in any culture, 
both today and in the future. As noted earlier, a combined emic and etic approach 
was used. 


Item format 


The type of item format(s) to be used on the GPI was explored at this point in the 
development process. A literature review was conducted to identify potential item 
types and the advantages and disadvantages of each. Alternatives studied included 
statements, paired statements, adjectives, bipolar adjectives, and adjective triads. In 
addition, how the items were to be scaled was explored at this point (e.g., T/F or 
Likert-Type scaling). Based on the review, it was decided that general statement 
type items would be developed and scaled with a 5-point Likert-type scale, ranging 
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from Strongly Agree to Strongly Disagree. This decision was based on the desire to 
have relatively short scales with good variability. Other formats such as true/false or 


forced choice did not appear to fit this objective. 


Development of items for each construct 


Each GPI development team was asked to hold a planning meeting for item writing. 
In these meetings the teams discussed the task and how the team would accomplish 
the task. Team leaders were asked to make item-writing assignments capitalizing on 
the strengths of each member in the group. That is, persons with the best knowledge 
of a particular construct were to be assigned to write items for that construct. After 
items had been written, the teams were asked to reassemble to discuss and edit the 
items. Each team was asked to write six to eight items per construct for the 39 con- 
structs. 

Several item writing guidelines were given to the teams. Below is the text of the 
instructions given to the teams: 


When writing items: 

a) All items should be written in the first person - e.g., “Тат...” "I like..." "For 
me..." etc.; 

b) All items should be written in such a way that they do not contain phrases that 
are culture bound. The California Personality Inventory (CPI) item that asks 
about preferences for Abraham Lincoln versus George Washington is a good ex- 
ample of a culture bound item; 

c) Items should be written to target a single underlying personality construct. Think 
about your item from the perspective of the test-taker. Think to yourself: "When a 
person responds to my item, what construct are they trying to portray information 
about in their response?" Remember also that this measure will be used in em- 
ployment-related interventions. This may slightly change the way respondents 
think about your item. Again, try to think of how the context presses both the re- 
sponse and the construct being portrayed by the respondent. You should think 
about both the personality construct the item should measure and the perform- 
ance construct(s) related to the personality construct. In other words, what per- 
formance construct(s) should be predicted by the scale and items in it?; 

d) When you write personality items, think about someone you know who is very 
high or low on the trait for which you are writing items. Think about the beliefs, 
attitudes, behaviors, mannerisms, feelings, and desires that this person projects 
as a result of the trait. Do some people watching...vou will learn a great deal 
about personality and individual differences associated with specific traits. Think 
back on your experiences in assessments; 

e) Keep items fairly short and direct; 

f) Consult with academic partners if vou feel they can help you in the item writing 
process. Consult textbooks on personality testing, psychometrics, and personality 
theory for additional help; 
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8) Write the items in vour native language first. Then, translate them to English. Get 
agreement from your team about whether the essence of the item was translated 
properly. This is a very important step. Everyone should agree that the item is 
translated properly or it should be dropped. 


An attempt was made to keep the guidelines to a minimum so as not to structure 
the task to the point that creativity would suffer. The development teams wrote a 
total of 3,012 items. Obviously this number of items had to be reduced before em- 
pirical data could be collected. 

The first step in the reduction process involved a subject matter expert review of 
the items. A group of 12 development team members, who were considered to be 
personality and cross-cultural experts. met to reduce the set of items. The goal was 
to reduce the number of items to 20 items per facet. Several criteria were used in the 
reduction process. First, items were dropped if they were deemed to be related to 
more than one of the facets. Second, items were dropped if they were deemed to be 
items that were inappropriate for any particular level of employees (particularly, we 
were looking for items that would be appropriate for executive level test-takers, as 
well as other levels of employees in organizations). Third, items were dropped if 
thev contained phrases or words that appeared to be very culture specific. Fourth, 
items that were deemed likely to have significant translation problems were 
dropped. Fifth, items were chosen to cover all aspects of the definition of the facet 
(e.g.. clearly redundant items were reduced to the best one item of the redundant 
set). Sixth. an attempt was made to include a relatively equal number of items from 
each culture represented by the item writers. That is, a stratified sample of items 
from across the cultures represented was sought as an outcome. (We felt that this 
was core to developing scales that would operate effectively across cultures. Within 
a scale. small subtleties within different cultures might be detected by items devel- 
oped in that culture, but as a whole, the scale would operate similarly across all cul- 
tures.) Finally. job performance factors were considered in the selection of items. 
Items with that were deemed unlikely to be related to job performance or might be 
found to be offensive to test-takers in a work context were excluded. 

At least three subject matter experts examined the items written for each facet 
and made recommendations for items to be dropped. Based on consensus of the 
subject matter experts, a pool of 802 items was identified as meeting the criteria for 
item inclusion. Next, the items were randomized and three subject matter experts 
sorted the items back into the facet categories. Items were retained if all three ex- 
perts sorted the items into the same category. A total of 79 items were dropped as a 
result of this process, leaving a pool of 723 remaining items. 

The 723 items, along with construct definitions, were then sent back to the devel- 
opment teams for review. The teams were asked to review the items using the same 
inclusion/exclusion criteria the subject matter experts used. In particular, they were 
instructed to identify items that they believed would not translate well into their lan- 
guage or were not congruent with the measurement of the construct of interest in 
their culture. They were also asked to make editorial suggestions on the list of items. 
As a result of this review. 216 items were dropped from the pool of 723, leaving 507 
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items. Many of the 507 items were modified slightly to fulfill the editorial requests. 
At this point, the initial, or Alpha, version of the test had been solidified. It con- 
tained 39 scales (32 trait scales and 7 syndrome scales) with 13 items each. 


Item translation 


The Alpha version of the GPI was translated from English into nine languages, 
British, German, French, Spanish, Dutch, Swedish, Japanese, Chinese, and Korean. 
The item translation process followed a series of steps and included several review 
processes. First a team consisting of a translator, editor, desktop publisher, and proj- 
ect manager was assembled. The English documents went to the translator first. 
When he/she completed the translation, it was sent to the editor who read through it, 
made comments, corrections, and suggestions. The editor then returned the docu- 
ments with his/her recommendations to the original translator. The translator made 
any necessary adjustments in the text and sent the translation to be typeset. When 
the layout was completed, the document was returned to the translator to make cer- 
tain text was not lost or misplaced during the typesetting process. АП translators and 
editors were accredited translators. Additionally, they were all native speakers of the 
language in which they work, all were educated specifically in translation, and all 
were tested for competency. They were not, however, psychologists. Therefore, once 
the translator approved the final version, the items were sent back to our psycholo- 
gists on the GPI development teams. The items were reviewed by bilingual psy- 
chologists on our development teams and edited once again, if warranted. 

The development teams took various approaches to ensure the quality of the 
translations. In all cases, at least two psychologists compared the original English 
version of the items to the translated version. These reviewers looked for linguistic 
correctness and psychological fidelity of the translated item with the original item. 

In China and Japan, a much more complex process was used because of the more 
complex nature of their languages and dissimilarity from English. In China, a double 
translation process was conducted. The translation process described above was 
conducted in the US, while a similar process was conducted in China. Then a bilin- 
gual Chinese psychologist and his team of graduate students reviewed both sets of 
translations against each other and against the original English version. They made 
editorial and content changes based on these reviews. 

In Japan, two bilingual psychologists compared the initial translation to the Eng- 
lish version. After corrections were made and checked, two bilingual native English 
speakers did a back translation and then a separate bilingual psychologist compared 
the back translation to the original English version and further edits were made. The 
translations were then reviewed and revised by the original review team plus a third 
person for objectivity. 

After all the development teams had finished reviewing and revising the transla- 
tions, the materials were sent back to the original translation team. The editor and 
desktop publisher then put the translated items into a booklet form. When the layout 
was completed, the document was returned again to the original translator to make 
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certain text was not lost or misplaced during the typesetting process. When the 
translator approved the final version, it was determined that the alpha version of the 
GPI was ready to go to the initial data collection phase. 


Item testing 


The goals of the item testing phase of the GPI development project were to develop 
a test: a) of manageable length (i.e., no more than 10 items per scale), b) that func- 
tions similarly across cultures, c) consisting of a common set of items in all lan- 
guage versions, and d) that is able to differentiate among individuals with different 
performance potential within and across levels of employees in organizations. Both 
traditional and modern statistical methods were used in this process. Item response 
theory (IRT) analyses were among the techniques used in the item testing. Essen- 
tially, the purpose of the IRT analyses was to identify items and scales that meas- 
ured the same things. in the same way, for individuals from different cultures with 
the same true scores on a construct. 


Data collection 


Data were collected in many samples throughout the world to maximize the hetero- 
geneity both within and across cultures. Within cultures, we tested individuals 
across as many different levels of employees as possible. Across cultures, we col- 
lected data in as many diverse countries as possible. These countries included China, 
Japan, Singapore, Indonesia, Sweden, United Kingdom, Netherlands, Belgium, 
France, Switzerland, Spain, Colombia, and the United States. Given that the primary 
clients of PDI are middle management and higher executives, over-sampling was 
conducted for these sub-groups. Samples sources include primarily MBA or other 
students with work experience, local community samples, and individuals recruited 
by the GPI development team members (e.g., friends, relatives, and clients). The 
total sample consisted of over 2,000 individuals, with just over 50 per cent from the 
middle management level and above. 

The three largest samples were from the United States (М = 303), China (N = 
432), and Spain (N = 463). These three samples were used for the majority of the 
item testing analyses. In addition to being the three largest samples, we felt these 
samples represented sufficiently distinct cultures that allowed us to identify those 
items that negatively affect comparability across cultures. 


Data analysis 


For practical purposes we decided that the final scales would have no more than 10 
items, and fewer if possible. Therefore, our first step was to reduce the 39 scales 
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from 13 items to 10 items each. Scale reliability estimates (Cronbach's coefficient 
alpha) and item-total correlations were calculated for all scales and items separately 
for each set of data. The three items with the lowest item-total correlations across the 
three samples were dropped. Results from the three sets of analyses were remarkably 
similar. Identifying the three worst items was very straightforward. Additionally, 
two scales, Goal-Directed Thinking and Conformance, had very low internal con- 
sistency in all three country samples, even after the three items were dropped. These 
scales were flagged for possible exclusion in the final test. 

Next, the data from the three countries were each split into 39 separate data sets 
(117 data sets in total). Each of these data sets contained data from one scale and one 
culture only. This was done so that we could treat each scale as a separate, uni- 
dimensional test for item response theory (i.e., IRT) analysis — a requirement for 
most IRT models (Thissen & Steinberg, 1988). For this project, we used Samejima's 
(1969) graded response model. This model is appropriate for responses made on 
Likert-format scales, which is the scale format used in the GPI. The graded response 
model describes the relation of an item to an underlying psychological construct 
with a slope (i.e., discrimination) parameter (a), that indicates the strength of the 
relation of an item with a construct, and threshold parameters (b;), that represent 
response endorsement probabilities. 

Item parameters were calculated for each item in all three samples and test scores 
were created for all subjects using these item parameters. Test information curves 
were then produced to visually examine the usefulness of the IRT scoring algorithm 
based on these item parameters. When using the full response scale (five response 
options), the test information curves were not normal and difficult to interpret. We 
thought this might be due to unstable parameter estimates for the extreme responses 
(1.е., strongly disagree and strongly agree) in the three smallish samples. To test this 
conjecture, we collapsed the raw data from a five-point response scale to a three- 
point response scale by grouping strongly agree and agree responses together and 
strongly disagree and disagree responses together. The middle, or neutral, response 
was not changed. New parameter estimates and subject test scores were created. 
New test information curves were produced. The new curves appeared to be much 
more normal and the parameter estimates more stable. Chi-square fit statistics that 
quantify the difference between observed data and the data reproduced by the esti- 
mating item parameters suggested that the three-option scoring actually fit the data 
better for graded response model purposes than the five-option scoring (see Dras- 
gow, Levine, Tsien, Williams, & Mead, 1995, for a full explication of this method). 
In fact, using this method to gauge the fit of the data to the IRT model, all of the 
scales (including the trait composite scales) using 3-point scoring fit the assumptions 
of the IRT model at levels greater than those evidenced in other published studies 
using this same method on Likert-type personality data (cf. Zickar & Drasgow, 
1996; Zickar & Robie, 1999). АП further IRT-based analyses conducted to test for 
differential item and test functioning (i.e., cross-cultural differences) were based on 
parameter estimates from the collapsed data sets. 
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Differential item and differential test functioning 


Differential item functioning (DIF) refers to the situation in which a particular item 
has different response functions for different groups of people such that an individ- 
ual from one group has a different expected probability of choosing a particular op- 
tion than an individual from another group, even though the two individuals possess 
the same level of the trait (Ө) being examined (Camilli & Shepard, 1994). For exam- 
ple, if an individual from Spain who possesses a particular level of a trait (e.g., 0 = 
1.50) has a different probability of choosing the most positive option for an item 
measuring that trait (e.g., Adaptability) compared to an individual from the US with 
the identical 0, then that item would be said to evidence DIF. 

Differential test functioning (DTF) is the scale-level analog to DIF, and refers to 
differences in expected total test or scale scores by individuals with equal standings 
on the latent trait but who belong to different subpopulations (Drasgow & Hulin, 
1990). Analysis of both DIF and DTF is important. The presence of several items 
displaying DIF, which can cancel each other, may not be indicative of serious meas- 
urement bias at the scale level. Furthermore, when using personality data for making 
personnel selection decisions, decision-makers almost always rely on information at 
the scale-level. Nevertheless, item-level differential functioning remains important 
to study because knowing the item properties (e.g., different discrimination pa- 
rameters or different endorsement parameters) that contribute to scale-level differ- 
ential functioning is helpful for eliminating DTF. 

The US sample served as the reference group for all analyses. The Chinese and 
Spanish samples were compared to the US sample separately to determine whether 
DTF was present for any scale in either cross-cultural comparison. For those scale 
comparisons that evidenced DTF, we sought to determine which items were contrib- 
uting to these differences. 

Before conducting analyses to assess DTF, we linked measurement (0) scales 
between the reference (US) and focal (Spanish or Chinese) groups. Such linking 
puts the item parameters on the same scale, which is necessary to estimate the 0 dif- 
ference between the groups of respondents and to appropriately test for DIF/DTF. 
Linking of 0 metrics was done using Equate 2.1 (Baker, 1997), which estimates a set 
of linear equating constants that are used to convert one set of item parameters to the 
metric of another set of item parameters (Baker, 1992). 

DIF statistics were computed using DFITP4 (Raju, 1998). The DIF statistic, 
NCDIF, tests for non-compensatory DIF among all items (Raju, Van der Linden, & 
Fleer, 1995; Flowers, Oshima, & Raju, 1999). When testing for non-compensatory 
DIF, one assumes that all other items in the scale are free from DIF (i.e., DIF in one 
item cannot cancel out DIF in another item). Additionally, DFITP4 computes com- 
pensatory DIF (CDIF) estimates for each item within a test (i.e., an individual GPI 
scale). DTF is the sum of these compensatory item-level DIF statistics (Raju er al., 
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Table 1. Differential item (DIF) and test (DTF) functioning results of the initial version of the GPI 


U.S. vs. Spain U.S. vs. China Both Comparisons 
Scale ‘s # DF  #DTF # DIF # DTF # DIF # DTF 
Agreeableness 
Conformance (8) 7 (6) 1 (1) 7 (6) 2 (2) 6 (4) 0 (0) 
Consideration (10) 3 (3) 1 (1) 5 (4) 3 (2) 3 (3) 0 (0) 
Empathy (8) 3 (3) 1(1). 3 (3) 1 (1) UG) - 0 (0) 
Interdependence (9) 7 (6) 2 (2) 7 (7) 1 (1) 7 (6) 1 (1) 
Openness (9) 6 (4) 3 (3) 6 (4) 1 (0) 4 (2) 0 (0) 
Thought Agility (10) 6 (4) 1 (1) 4 (4) 2 (2) 3(1) 0 (0) 
Trust (10) 5 (3) 7 (7) 6 (6) 2 (2) 2 (2) 1 (1) 
Conscientiousness 
Attention to Detail (10) 6 (5) 6(5) · 5 (5) 3 (3) 4 (3) 3 (2) 
Dutifulness (9) 5 (5) 2 (2) 6 (6) 0 (0) 4 (4) 0 (0) 
Responsibility (8) 2 (2) 2 (2) 4 (4) 1 (1) 2 (2) 1 (1) 
Work Focus (10) 5 (4) 1 (1) 9 (7) 1 (1) 4 (2) 0 (0) 
Extroversion 
Adaptability (8) 5 (5) 3 (3) 7 (6) 0 (0) 5 (4) 0 (0) 
Competitiveness (9) 5 (3) 1 (1) 6 (6) 0 (0) 2 (1) 0 (0) 
Desire for Achievement (9) 5 (4) 3 (3) 7 (7) 5 (1) 4 (3) 2 (1) 
Desire for Advancement (10) 5 (4) 4 (4) 7 (6) 2 (2) 3 (2) 1 (1) 
Energy Level (10) 6 (5) 4 (4) 5 (4) 1 (1) 3 (3) 1 (1) 
Influence (10) 5 (4) 1 (1) 7 (2) 3 (3) 5 (1) 1 (1) 
Initiative (10) 6 (2) 0 (0) 7 (6) 2 (2) 5 (1) 0 (0) 
Risk-Taking (10) 7 (2) 2 (1) 10 (9) 1 (1) 7 (2) 1 (0) 
Sociability (10) 7 (7) 7 (7) 8 (7) 2 (2) 5 (5) 1 (1) 
Taking Charge (10) 5 (3) 4 (4) 8 (6) 0 (0) 4 (2) 0 (0) 
Neuroticism 
Emotional Control (9) 6 (5) 5 (4) 8 (7) 0 (0) 5 (4) 0 (0) 
Negative Affectivity* (9) 6 (5) 3 (3) 6 (4) 1 (1) 4 (1) 0 (0) 
Optimism (10) 7 (6) 1(1) 8 (7) 3 (2) 5 (4) 0 (0) 
Self-Confidence (9) 8 (7) 2 (1) 7 (6) 4 (4) 6 (4) 1 (0) 
Stress Tolerance (10) 6 (6) 7 (5) 5 (5) 1 (1) 4 (4) 1(1) 
Openness to Experience 
Goal-Directed Thinking (8) 5 (5) 0 (0) 3 (3) 1(1) 2 (2) 0 (0) 
Independence (10) 7(6) 6(5) 6 (6) 3 (3) 4 (3) 3 (3) 
Innovativeness/Creativity (10) 6 (5) 2 (2) 7 (5) 2 (1) 4 (3) 0 (0) 
Social Astuteness (9) 4 (4) 1 (1) 7 (5) 2 (2) 4 (3) 0 (0) 
Thought Focus (9) 2 (2) 2 (2) 5 (3) 1 (1) 1 (1) 0 (0) 
Vision (9) 5 (4) 2 (2) 6 (4) 2 (2) 3 (2) 0 (0) 
Trait Composites 
Ego-Centered" (9) 8 (6) 2 (2) 9 (9) 0 (0) 8 (6) 0 (0) 
Impressing* (8) 7 (5) 7 (7) 8 (6) 2 (1) 7 (5) 2 (1) 
Intimidating* (9) 6& (5) 2(1) 7 (5) 8 (8) 5 (3) 2 (1) 
Manipulating" (10) 7 (6) 4 (4) 8 (5) 0 (0) 6 (4) 0 (0) 


Micro-Managing* (10) 7 (6) 1 (1) 6 (5) 3 (3) 4 (3) 0 (0) 
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Passive-Aggressive* (8) 8 (8) 0 (0) 6 (5) 3 (1) 6 (5) 0 (0) 
Self-Awareness/Self-Insight (10) 7 (7) 2 (2) 3 (3) 2 (2) 3 (3) 1 (1) 


Note: Number of items in each scale after dropping items based on classical test theory analyses in 
parentheses after scale name. *High scores on these scales are undesirable. 4 DIF - number of 
items in each scale that evidenced significant differential item functioning. # DTF = number of 
items needed to be dropped to make DTF non-significant (numbers in parentheses are based on less 


stringent differential functioning criteria — see Differential item and test functioning section of 
the chapter). 


1995; Flowers et al., 1999). Raju et al. (1995) proposed a X2 test for assessing the 
statistical significance of the observed DTF index; the degrees of freedom for this 
index equal to Np - 1, where Nr is the number of examinees in the focal group. Raju 
(personal communication, October 12, 1998) currently recommends using a cut-off 
of 0.096 on the DTF index for scales composed of items with five options (0.010 has 
traditionally been used with 3-point scales; see Fleer, 1993; Flowers, 1995). Scales 
with DTF values above this cut-off that also have significant x2 values at a p < .01 
level are said to evidence DTF. Raju's (personal communication, October 12, 1998) 
cutoffs for both DIF and DTF are based on an analytical formulation that is intended 
to identify items and tests (i.e., scales) that evidence DIF or DTF which is of practi- 
cal significance. Analyses in the current study were conducted first using the 0.010 
DTF cut-off and then a more liberal 0.096 cut-off. We looked at this more liberal 
cut-off because the data underlying the collapsed data was a 5-point scale (we report 
results for both cut-offs in Table 1). 

An iterative process, involving several rounds of parameter estimation and link- 
ing, was used to identify which items had DIF (Candell & Drasgow, 1988). Items 
that exhibited DIF (a x2 with a p € .01 and a NCDIF value 2 0.010 in the first set of 
analyses; a X2 with a p € .01 and a NCDIF value > 0.096 in the second set of analy- 
ses) were removed from the analyses and equating constants were re-computed 
without the items that evidenced DIF. Using these new equating constants, NCDIF 
statistics were re-computed. If additional items exhibited DIF at this stage, another 
iteration was continued with a new linking. This iterative process continued until no 
new items with DIF were identified. 

Table 1 shows the result of DIF and DTF analyses. The values under the #DIF 
column indicate the number of items within that scale that evidence DIF using a 
NCDIF criterion of .010, which we had used in our first round of analyses. The val- 
ues in parentheses are the same indicator, but with the less stringent criterion of 
0.096. When looking at NCDIF, regardless of the criterion used, a large number of 
individual items evidence DIF. Yet, when combined into a scale, the actual number 
of items that appear to be causing bias at the test level is minimal (based on CDIF). 
The number of items that contribute to DTF, based on CDIF are listed in the column 
labeled #DTF. This also represents the number of items that would need to be de- 
leted from the scale to remove DTF. In most cases this is a very small number. 
When both the Spain v. US and China v. US analyses are considered simultaneously 
(last two columns), in only a few cases did any one item contribute to DTF in both 
comparisons. These analyses provide strong evidence that, at the scale level, the GPI 
can function similarly across cultures, using the same item set. 
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Final scale decisions 


After the DIF iterations were complete, we had many pieces of item- and scale-level 
information on which to base our decisions about final scale composition. This in- 
formation included item parameter estimates (a and bj), DIF and DTF values, and 
more traditional test information, such as means, standard deviations, item-total cor- 
relations, and internal consistency estimates. See Table 2. We used a holistic ap- 
proach to make final decisions. The criteria we used to guide our decisions are de- 
tailed below. 

The first criterion we set was that retained items should have a-parameters greater 
than 0.50. Items with a-parameters below this criterion offer little discrimination 
among test-takers. Another criterion we used was that retained items should not have 
extreme b-parameters (those above or below four). Such items discriminate best for 
those with very extreme levels of a trait, which is not common among working 
adults. Additionally, we considered all items on a scale simultaneously, with respect 
to their b-parameters. We wanted items that, when combined, measure the full range 
of the trait well (i.e., we wanted a series of items with b-parameters that allowed us 
to measure the trait well within + three standard deviations). 

In addition to using IRT item parameter-related criteria, we also considered the 
DIF and DTF information. We sought to keep items that did not contribute to DTF, 
and that did not exhibit excessive NCDIF in one or both cultural comparisons. Ad- 
ditionally, we avoided keeping items that were identified by the DFITP4 program as 
those to be dropped in order to eliminate DTF. 

Lastly, throughout several rounds of dropping and adding items based on the 
IRT-based criteria listed above, we examined item-total correlations and internal 
consistency statistics. The final scales contained a minimum of seven items and a 
maximum of ten items, contained items that had acceptable item-total correlations in 
the US, Spanish, and Chinese samples, and had acceptable internal consistency. The 
previously identified problematic scales of Goal-Directed Thinking and Confor- 
mance did not attain an acceptable level of psychometric adequacy by dropping par- 
ticularly poor items. After examining both the item statistics and correlations with 
other scales, consensus was reached among the GPI development teams that these 
two constructs were both multi-dimensional and redundant with other content of the 
GPI. Thus, these scales were dropped from the GPI. The final version of the GPI 
contained 37 scales and 300 items. Sample items are included in Appendix B. 

After the new scales were established, internal consistency estimates were calcu- 
lated for the final 37 scales using data from all other countries than the US, Spain, or 
China. These countries are: England, Colombia, US (for those who speak English as 
a second language), Sweden, France, Japan, Germany, Singapore, and The Nether- 
lands. In general, alpha values for the scales in these cultures were acceptable, as 
most were in the .70s and .80s ranges (See Table 3). 

For all of the language versions used to generate these alphas, item-total correla- 
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Table 3. Internal reliability estimates for GPI scales in diverse cultural samples 


z 

m E 

Е 5 

Е: p) > = т 

= © 5 5 X. i 

Sc "ва o Ф = = & on 3 

5 BITES ^ 5 У = = ee 
Scale N's 93 60 28 30 244 122 101 102 101 
Adaptability (8) .65 79 72 60 59 65 .79 74 65 
Attention to Detail (9) .81 80 .83 80 78 .71 ‚79 93 82 
Competitiveness (8) .62 79 70 69 83 .65 77 75 70 
Consideration (10) .83 .92 70 83 80 272 84 86 85 
D. for Achievement (8) .76 .80 75 77 80 .75 85 85 82 
D. for Advancement (7) .64 .74 75 78 78 .70 64 77 72 
Dutifulness (8) .65 .80 34 75 68 .61 76 73 75 
Emotional Control (7) .71 .80 60 62 71 245 78 77 73 
Empathy (7) .74 .82 76 82 70 „73 68 83 75 
Energy-Level (9) .76 .80 75 81 75 .74 79 84 76 
Impressing (7) 292. .69 05 65 49 .41 51 64 64 
Independence (8) 13 FAI 52 65 69 .56 66 68 71 
Influence (9) TZ .83 89 79 80 .76 83 82 75 
Initiative (9) .74 .85 79 83 78 .71 84 83 82 
Innovativeness (9) .83 .91 68 87 03055779 89 85 85 
Interdependence (8) T .80 62 76 76 .69 72 69 81 
Negative Affectivity (7) .63 71 58 62 57 37 64 67 70 
Openness (7) .62 .76 72 61 73 .70 76 68 79 
Optimism (9) .76 .84 66 78 80 .68 85 83 78 
Responsibility (7) „77 .89 80 45 69 .66 79 87 87 
Risk-Taking (9) .79 .85 87 83 .82 .81 78 83 80 
Self-Awareness (9) .82 82 78 77 .74 .82 84 86 91 
Self-Confidence (7) ТА) .78 61 59 72 .45 76 79 87 
Sociability (9) .83 .86 83 87 77. ‚74 89 80 75 
Social Astuteness (8) .75 .78 56 75 73 .60 78 81 73 
Stress Tolerance (8) .76 .84 73 .80 74 .68 85 78 72 
Taking Charge (10) .83 .90 .88 .88 .86 .79 ‚87 ‚89 .82 
Thought Agility (9) .82 .91 .80 .78 .75 .67 .86 .82 .80 
Thought Focus (7) .68 .85 .80 .54 ‚76 ‚74 ‚76 ‚82 ‚83 
Trust (7) 272 .81 .64 .72 .64 259 .76 .67 .65 
Vision (9) .79 .86 ‚71 ‚82 75 ‚65 ‚81 ‚83 ‚83 
Work Focus (9) .79 .76 .76 .79 .76 .74 .82 77 ‚81 
Ego-Centered (7) .64 .64 .52 .58 Zi ‚45 1259 .61 257] 
Intimidating (7) .66 ‚64 ‚56 ‚55 .56 .48 .54 ‚65 .49 
Manipulation (10) .79 .79 27] 71 .82 .63 .81 ‚74 272 
Micro-managing (7) .69 .74 .63 55 .63 .50 .62 .60 55 
Passive-Aggressive (7) .56 .69 .71 .79 293 .46 67 ‚60 50 


Note: The values in parentheses next to the scale names indicate the number of items in each scale 
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tions were examined in an effort to identify aberrant results across the versions. For 
items that had adequate item-total correlations in most language versions but did not 
in one or two, we examined the items for possible translation problems. Translator- 
psychologist meetings were used to identify both grammatical and cultural transla- 
tion problems. The translator would back translate the item to English and the psy- 
chologist would work with the translator to arrive at a translation that captured the 
original psychological content of the item. Nearly all identified items were changed 
in at least some small way in an effort to improve their psychometric quality. These 
items are flagged for future analyses when additional data are available for the latest 
edition of the items and scales to which they belong. 


Scale validation 


Comparability of factor structure across cultures 


To this point in the test development, our focus has been on the item- and scale- 
level. However, we were also interested in exploring whether the factor structure 
was similar across the cultures studied and whether it conformed to the Big Five 
structure. We utilized targeted Procrustes techniques instead of CFA (i.e., confir- 
matory factor analytic) techniques because of some of the disadvantages of CFA, 
including technical difficulties in the estimation of some models (McCrae, Zonder- 
man, Costa, Bond, & Paunonen, 1996) and the tendency in CFA for several sub- 
stantively different models to fit any given data set at a similar level (MacCaullum, 
Wegener, Vchino, & Fabrigan, 1993). 

Before conducting the factor analyses, we excluded from analysis any trait com- 
posite scale because each of these scales is composed of constructs that are both- 
within and external to what most researchers would conceive of as the bounds of the 
Big Five factor structure (Ego-centered, Manipulating, Intimidating, Micro- 
managing, Passive-aggressive, Impressing. Self-awareness/Self-insight). Out of the 
37 remaining scales, we used 30 for the Procrustes analysis. We always used the 
principal components, varimax-rotated factor structure from the American sample as 
the target matrix and rotated the principal components, varimax-rotated factor 
structures from the Spanish and Chinese samples to that matrix. Three-, four-, five-, 
six-, and seven-factor solutions were computed in this manner and congruence coef- 
ficients were calculated (see McCrae er al., 1996, for specifics on the methodology). 
As can be seen from Table 4, the factor structures were stable for the three-, four-, 
and five-factor solutions (i.e., all factor congruence coefficients at or above ap- 
proximately .90, cf. Paunonen, 1997), but became unstable at six and seven factors. 
Given that we hypothesized and were interested in the five-factor solution, we will 
concentrate on this interpretation of the data. 
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Table 4. Factor congruence coefficients with the American normative structure 


Number of Factors Congruence Coefficients 

3 (U.S. vs. Spain) .97 .93 197 

3 (U.S. vs. China) .97 .92 .95 

4 (U.S. vs. Spain) .97 .96 .96 .94 

4 (U.S. vs. China) 197 .93 .91 .91 

5 (U.S. vs. Spain) .98 .94 .95 .94 .92 

5 (U.S. vs. China) .95 .91 .92 .91 .89 

6 (U.S. vs. Spain) .97 195 .80 .91 .96 .75 

6 (U.S. vs. China) „95 .90 .81 .88 .91 .70 

7 (U.S. vs. Spain) .98 193 .96 .97 .96 .84 .89 
7 (U.S. vs. China) .97 .86 .91 .92 .92 .67 .84 


Note: Principal components, Varimax-rotated. Final form of the СРІ was used with trait composites 
removed. For the five-factor solution, Factor 1 - Extroversion, Factor 2 - Neuroticism, Factor 3 - 
Agreeableness, Factor 4 = Conscientiousness, Factor 5 = Openness to Experience. 


Factor loadings for each of the five-factor Procrustes solutions are displayed in Ta- 
ble 5. The factor loadings were highly consistent with the authors' a priori hypothe- 
ses. Although some degree of secondary loadings were evidenced, these results are 
consistent with the level of equivalence found with the NEO-PI-R across cultures 
(Costa & McCrae, 1997; McCrae et al., 1996). Moreover, the item congruence coef- 
ficients, which index the degree to which the items fit the factor structure, averaged 
.95 in the U.S. versus Spain comparison (range = .77 to .99) and .92 in the U.S. ver- 
sus China comparison (range = .82 to 1.00). As with the factor congruence coeffi- 
cients, item congruence coefficients above .90 are generally considered to be evi 
dence of a good fit of the model to the data (Paunonen, 1997). Using Paunonen's 
(1997) monte-carlo generated equation of congruence prediction (p. 52), all of the 
item and factor congruence coefficients were above the 95 per cent confidence limit 
of what might be expected if one factored randomly-generated data and attempted to 
fit a model with the same number of variables and factors. 

The confirmation of a Five Factor model underlying the GPI provides robust evi- 
dence for construct validity of the measure. Further, the facet level loading results 
are highly consistent with previous measures of the Big Five, suggesting that this 
cross-cultural measure of personality is likely to perform at least as well as instru- 
ments developed and transported with an imposed-etic strategy. 

We subsequently conducted similar analyses in which we used the original, five- 
factor, principal components, varimax-rotated factor structure from the original 
American sample as the target matrix and rotated the principal components, vari- 
max-rotated factor structures from three other samples to that matrix. These three 
other samples were: (1) a random, stratified sample of respondents from U.S. house- 
holds (М = 988), (2) a sample of individuals who participated in PDI assessments for 
developmental purposes (М = 946), and (3) a sample of individuals who participated 
in PDI assessments for promotion/selection purposes (N = 1,069). The distribution 
of age across the organizational samples was similar in both mean level and disper- 
sion (promotion/selection M = 39.47 and SD = 8.44; development M = 41.06 and 52 
= 7.01). The U.S. normative sample was both older (M = 52.79) and approximately 
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twice as diverse in age (SD = 15.19) than either of the organizational samples. The 
organizational samples were composed of approximately 70-75 per cent males 
whereas normative sample was composed of approximately 50-55 per cent males. 
The organizational samples were predominantly managerial whereas the normative 
sample contained a range of job levels. 

Factor loadings for these five-factor Procrustes solutions are displayed in Table 6. 
For the U.S. normative and PDI developmental samples, the factor and item congru- 
ence coefficients were above .90. However, for the promotion/selection sample, the 
Neuroticism factor congruence coefficient was .88 and the Openness factor congru- 
ence coefficient was .82. Also for this sample, the item (in this context, scale) con- 
gruence coefficient were .55 for Interdependence, .62 for Emotional Control, and .62 
for Independence. 

These results are not unexpected as research has shown that high stakes testing 
situations, such as when one is taking a test for promotion or selection purposes, 
tend to degrade the construct validity of personality measures (Schmit & Ryan, 
1993). It unlikely that the degradation of construct validity evidenced in the promo- 
tion/selection sample seriously affects decisions that are made from the instrument 
in that context. In fact, a recent study by Robie (2000), comparing the scale score 
means and standard deviations from the above organizational samples, found very 
small average differences (less than 1/5 of one standard deviation) and very similar 
standard deviations across the two samples. Moreover, item-level IRT analyses 
comparing incumbent and applicant responses to a personality measure that the GPI 
is partially based upon found little differential item or test functioning (Robie, 
Zickar, & Schmit, in press). 


Current validation studies 


Several additional validation studies are either still being conducted or have been 
completed. These studies are briefly described here. 


Reliability and related issues 


A study of the reliability of the GPI has been conducted at two large US universities 
using undergraduate students (Ryan, Robie, Schmit, & Uhlmann, 2000). Trait meas- 
ures should be internally consistent and highly consistent across time. For the IRT 
analyses in the development study, response options were collapsed from five to 
three because of the distributions of responses in the available samples. While we 
have left the number of response options at five in the current version of the GPI, 2-, 
3-, and 5-point response scales have been compared in this large sample student 
study to identify the level at which the optimal information is gained and least error 
introduced. The total number of participants in this study was approximately 300 
with approximately 100 participants per response scale. 

Participants in the Ryan ег al. (2000) study were asked to fill out the GPI twice 
with the administrations separated by two weeks. Results indicated that scale scores 
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from the 5-point response scale were more internally consistent (Мот = .73; Man 
= .77), more stable across time (M, = .78), and less skewed than the scale scores 
from the other two response scale formats. No differences emerged in the amount of 
missing data evidenced for or participant reactions to the various formats. The re- 
sults of the Ryan er al. (2000) study suggest that the present use of the 5-point re- 
sponse format is probably best practice for this particular personality measure. 


Criterion-related validity 


Several studies are being conducted to compare various performance criteria with 
GPI scales across cultures and management levels. The job performance model used 
in the development of the GPI was used in the development of criterion measure for 
these studies. Our research plan also includes the comparison of a common criterion 
measure across cultures using similar techniques that were used for predictor 
evaluation. Appendix A shows the expected relationships between GPI facet scales 
and job performance factors. 

Initial results from four concurrent validation studies are provided in Table 7. The 
four samples were all from middle management jobs, including HR Managers and 
Division Managers (both in Retail), Distribution Center Management, and Managers 
in a State agency (public sector). As the total N of the samples is relatively small, 
correlations are reported at the broadest level. GPI facet scales were combined in 
unit-weighted composites at the Big Five level, while criterion ratings were aggre- 
gated to form a single. overall performance composite. Because the criterion meas- 
ures differed slightly from study to study, criterion scores were standardized within 
study before aggregation. Consistent with previous meta-analytic work (e.g., Barrick 
& Mount, 1991), the results showed Conscientiousness and Extroversion factors to 
have the strongest relationship to overall performance. In addition, the remaining 
three predictor composites were related to overall job performance, but at relatively 
lower levels. 

GPI scale scores, 360 feedback, and other assessment center data are currently 
being collected. These data will provide further criterion-related validity evidence 
for inferences made using the GPI. To date, we have examined the relations between 
the GPI (scored at the Big Five level) and scales (averaged across boss, peer, and 


Table 7. Correlation of Big Five predictor composite scores and overall job performance 


M SD Overall Job Performance 
Overall Job Performance | 1.49 6.49 - 
Extraversion 55.78 9.49 r= .30* 
Agreeableness 34.68 4.99 rz.22" 
Conscientiousness 22.98 3.62 pi 255 
Neuroticism 22.79 2.91 r=.14 
Openness to Experience 24.98 3.58 n= AT 


Note: *One-tailed test, significant at p < .05. All correlations are uncorrected. N = 198. 
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subordinate ratings) on a 23-scale PDI 360-degree feedback instrument called the 
Executive Success Profile (ESP; Hezlett, Ronnkvist, Holt, & Sloan, 1997; defini- 
tions of each scale can be found in Appendix C) for 115 executives (5496) and mid- 
dle managers (3596) from a variety of organizations who participated for develop- 
mental purposes. The sample was predominantly Caucasian (95%), male (83%), and 
middle-aged (M = 43.01, SD = 6.65). Agreeableness and Neuroticism on the GPI 
were not significantly related (о = .05) to any of the ESP scales. Conscientiousness 
on the GPI was significantly related (о = .05) to the following ESP scales: Building 
Organizational Relationships (ғ = .20), Fostering Open Dialogue (r = .19), and Drive 
for Stakeholder Success (r = .21). Extroversion on the GPI was significantly related 
(о = .05) to the following ESP scales: Visionary Thinking (r = .26), Global Perspec- 
tive (r = .26), Shaping Strategy (r = .27), Driving Execution (r = .27), Empowering 
Others (r = .24), Influencing and Negotiating (r = .26), High Impact Delivery (r = 
.20), Entrepreneurial Risk Taking (r = .29), Cross-functional Capability (r = .24), 
Industry Knowledge (r = .23), Business Situation Versatility (r = .27), and Leading 
Continuous Improvement (r = .29). Lastly, Openness on the GPI was significantly 
related (а. = .05) to the following ESP scales: Visionary Thinking (r = .31), Global 
Perspective (r = .33), Shaping Strategy (r = .25), Driving Execution (r = .20), High 
Impact Delivery (r = .19), Entrepreneurial Risk Taking (r = .31), Industry Knowl- 
edge (r = .26), Business Situation Versatility (r = .25), and Leading Continuous Im- 
provement (r = .26). 

We also examined the correlations of the GPI trait composite scales with the ESP 
scales for this organizational sample. The following GPI trait composite scales were 
not significantly related (а. = .05) to any of the ESP scales: Ego-Centered, Micro- 
Managing, and Passive-Aggressive. The GPI Impressing scale was significantly re- 
lated (о = .05) to the Adaptability scale on the ESP (r = .21). The GPI Self- 
Awareness/Self-Insight scale was significantly related (а. = .05) to the Financial 
Acumen scale on the ESP (r = -.19). The GPI Intimidating scale was significantly 
related (a = .05) to the following ESP scales: Visionary Thinking (r = .22), Finan- 
cial Acumen (r = .22), and Entrepreneurial Risk Taking (r = .20). The GPI Manipu- 
lation scale was significantly positively related (о = .05) to every ESP scale with an 
average r = .26 except Financial Acumen and Inspiring Trust! 

The Manipulation trait scale may be akin to the Machiavellian factor that has 
been found in lexical (i.e., Big Five) studies as a sixth factor (Ashton, Lee, & Son, 
2000). Wilson, Near, and Miller (1996; 1998) describe Machiavellianism as a set of 
manipulative strategies of social conduct and suggest that it is actually an adaptive 
characteristic in some situations but not others. Wilson er al. (1996; 1998) also state 
that those who score high on Machiavellianism are often charming and attractive in 
short-term social interactions. Perhaps many raters in 360-degree feedback contexts 
only observe the ratees in the context of these short-term interactions; it is important 
to understand that 360-degree feedback systems are not designed to provide a true 
measure of performance but, instead, the perceptions of various constituencies re- 
garding a given person’s performance. Thus, it is uncertain from this data whether 
one is to infer that Machiavellianistic forms of behavior are truly adaptive for man- 
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Table 9. Example GPI Scale Norm Differences Among Different Populations 


General Individual Middle 
* Population Contributors Managers Executives 
М = 461 М = 252 М = 289 М = 140 
Scale (# of items) M SD | M SD | M SD | M SD 
Desire for Advancement (7) 3.20 1.04 3.30 (015 3.89 TS 4.38 1.01 
Influence (9) 5.07 (eile! 5:08 1.19 6.04 1.10 6.22 1.03 
Risk Taking (9) 4.77 1.34 5517821738 5.87 1:35 6.01 1.22 
Taking Charge (10) 5.80 1.68 5.88 1.51 7.07 1.29 7.57 1.01 
Responsibility (7) 5.45 (01777) 5.45 0.92 5.58 0.88 5.48 0.80 


agers and executives, or are simply an effective means of appearing to be effective. 
Interestingly, the profile of an individual who scores high on the Manipulation scale 
tends to be someone who is high on Desire for Advancement (r = .20), Independ- 
ence (r = .20), and Negative Affectivity (r = .45) and low on Dutifulness (r = -.31), 
Interdependence (r = -.23), Responsibility (r = -.23), Stress Tolerance (r = -.21), and 
Trust (r = -.35). Clearly, this is not an entirely positive picture! 


Construct validity 


We are currently collecting data in order to compare GPI scale scores to other well- 
established personality measures. In addition, a study of self-other responses to the 
GPI is being conducted. In this study, individuals who complete the GPI will have 
significant others in their lives complete the same measure about them. These stud- 
ies will provide further evidence for the construct validity of the GPI scales. 

Some additional information about the construct validity of the trait composite 
and syndrome scales in the GPI is provided in Table 8. The table includes correla- 
tions among the Big Five trait composites and the syndrome scales from data col- 
lected from the general population norm group noted in the next section. 


General population and specific norms 


Studies have been conducted to collect both general and management population 
norms for the United States. Similar normative data has been collected for other 
countries and regions as well. The general population data were based on a stratified 
sample from random U.S. households. The data represent all working adults at all 
levels in organizations. The final group included approximately 1000 adults. More 
specific norms are currently being collected for executive, middle managers (man- 
agers of managers), and individual contributors. Table 9 contains a sample of GPI 
scales and norms from the four populations. These data illustrate that the facet scales 
presented are detecting differences across levels that might be expected. There are 
mean differences across levels for the Desire for Advancement, Influence, Taking 
Charge, and Risk Taking scales; for each scale, the higher in the organization, the 
higher the mean score. However, the Responsibility scale does not vary across lev- 
els, as might be expected. Many of the other scales in the GPI also reflect past find- 
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ings with similar constructs and are consistent with theoretical expectations provid- 
ing further support for the construct validity of the instrument. 


Summary 


There are two features of the GPI that make it unique among previously developed 
personality measures. First, the measure was not developed in a specific country and 
then transported to others. Rather, it was developed with input from around the 
world. Second, applied psychologists developed the measure explicitly for work- 
related applications. 

The development and validation of the GPI followed a method that truly took a 
global approach. The methods included both rational and empirical techniques. We 
started with input from many cultures, both in the development of constructs and in 
the operationalization of those constructs in the item content. Statistical methods 
were used with data from several cultures in order to optimize the possibility that the 
GPI would be a test that can be useful to applied psychologists in many, if not all, 
cultures. Both item-level and scale-level construct validity were established using 
traditional and modern psychometric methods. Current research is focused on 
building criterion-related validity and additional construct validity evidence for the 
inferences to be made with the GPI. Again, a multi-cultural approach is being fol- 
lowed. We believe this global approach will result in a measure that is useful to ap- 
plied psychologists in many cultures. 

Applied psychologists who wish to interpret scores across cultures need a com- 
mon framework provided by a tool in which they can have confidence that scores 
mean the same thing in different cultures. For example, if a psychologist in Japan 
wishes to assess a Japanese manager for potential assignment in the US, he or she 
could compare the scores of that manager to both Japanese and US managers to 
identify how the manager stacks up in each group. Armed with this information, the 
psychologist can make decisions about the compatibility and potential for success as 
a manager in the US management culture, in addition to providing information im- 
portant for coaching prior to the transitions. The GPI provides the common frame- 
work necessary for this and many other applications of cross-cultural assessment 
because the evidence for the construct validity of the tool across cultures is strong. 

The second feature of the GPI is that it was developed specifically for work- 
related uses (e.g., selection, development, coaching, etc.). Personality and perform- 
ance models were considered both in the conceptual development of the GPI’s con- 
tent and in the writing and selection of items. Applied psychologists from many 
cultures came to consensus on what constituted successful performance at work. 
They also came to consensus on what personality constructs were important to 
measure in order to better understand the behaviors and predictors of behaviors in 
those performance constructs. Further, during the item writing process, the psy- 
chologists were asked to think about both the predictor and criterion space when 
writing the items. The result was that a work context was either specifically men- 
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tioned in many items or at least implied. Previous research has shown the possible 
advantages of such an approach (Schmit, Ryan, Stierwalt, & Powell, 1995). While it 
is still an empirical question with limited support to date, we believe that GPI will be 
a useful tool in the prediction of work-related behavior across cultures. 
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Appendix A: Performance Factors and GPI Personality Facets 


This is a model of PDI Performance Factors and associated GPI traits and trait composites. 
The assumption is that individuals with relatively better scores (i.e., in the desired direction) 
on traits associated with a performance factor will indeed perform better in parts of their 
work that fall into these factor categories. 


1. Administrative Factor 


1. Attention to Detail - This is a measure of the tendency to be exacting and precise. 
This is a trait characterized by: a desire for accuracy, neatness, thoroughness, and 
completeness; the ability to spot minor imperfections or errors; and a meticulous 
approach to performing tasks. 


2. Work Focus - This is a measure of the tendency to be self-disciplined in one's 
approach to work. This is a trait characterized by: efficient work habits; being 
planful and organized; being focused on the process of task implementation; being 
able to concentrate on what is most important at the moment; not being distracted 
easily by other's or one's own boredom; and not procrastinating on tasks that are 
unpleasant or not very exciting. 


П. Thinking Factor 


1. Thought Agility - This is a measure of the tendency to be open both to multiple 
ideas and to using alternative modes of thinking. It is a measure of divergent 
thinking that is focused on the input and processing of information. This is a trait 
characterized by: thought flexibility; the ability to think things through by looking 
at many perspectives; the desire to draw out ideas from others; and a willingness to 
consider other's ideas along with one's own. 


2. Innovativeness/Creativity - This is a measure of the tendency to produce unique 
and original things. It is a measure of divergent thinking that is focused on the gen- 
eration and output of unique ideas and expressions of ideas. This trait is character- 
ized by: being inventive; being imaginative; being expressive of ideas and feelings 
through original and unique output. 


3. Thought Focus - This is a measure of the tendency to understand ambiguous in- 
formation by analyzing and detecting the systematic themes in the data. It is a 
measure of convergent thinking that is focused on the input and processing of in- 
formation. This is a trait characterized by: analytical and logical thinking ability; 
the ability to find patterns in data that may seem initially unsystematic or ambigu- 
ous; a desire to focus on finding a single best answer rather than proposing multiple 
possibilities; a preference for objective rather than subjective input; and a desire to 
use a systematic approach to guide thinking. 


4. Vision - This is a measure of the tendency to have foresight in one's thinking. 
This trait is characterized by: the ability to visualize outcomes, the tendency to 
think in a holistic manner; taking into account all variables that will effect future 
events; the tendency to take a long range perspective in one's thinking; and the 
ability to anticipate future needs, problems, obstacles, eventualities, and outcomes. 
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Ш. Leadership Factor: Facilitating Traits and Derailing Trait Composites 


A. Facilitating Traits 


1. Taking Charge - This is a measure of the tendency to take a leadership role. This 
trait is characterized by: a desire to direct the activities of others; an ability to mo- 
bilize others to take action; a desire to take a leadership role; a desire to step for- 
ward when there is no clear leader; and a willingness to take responsibility for 
guiding others’ actions. 


2. Influence - This is a measure of the tendency to get others to view and do things 
in a certain way. This trait is characterized by: being persuasive; negotiating well; 
impacting the thoughts and actions of others; gaining support and commitment from 
others; being diplomatic; and using tact. 


B. Derailing Trait Composites 


1. Ego-Centered - This is a measure of the tendency to be self-centered and appear 
egotistical. This is a trait composite characterized by: appearing overly involved 
with and concerned about one's own well being and importance; an inflated evalua- 
tion of personal skills and abilities; appearing condescending to others; and an atti- 
tude of entitlement to position and rewards. 


2. Manipulation - This is a measure of the tendency to be self-serving and sly. This 
trait composite is characterized by: a tendency to try to cover up mistakes; the abil- 
ity to protect oneself by shifting blame onto others; carefully sharing information to 
serve one's own purpose to the detriment of others; and a willingness to take ad- 
vantage of others. 


3. Micro-Managing - This is a measure of the tendency to over-manage once a per- 
son has advanced to higher levels of management. This trait composite is charac- 
terized by: staying involved in too many decisions rather than passing on responsi- 
bility; doing detailed work rather than delegating it; and staying too involved with 
direct reports rather than building teamwork among the staff. 


4. Intimidating - This is a measure of the tendency to use power in a threatening 
way. This syndrome is characterized by: acting cold and aloof; an abrasive ap- 
proach to others, a bullying style; and the use of knowledge or power to create fear 
in or subdue others. 


5. Passive-Aggressive - This is a measure of the tendency to avoid confronting oth- 
ers, conveying acceptance or cooperation and yet appearing to behave in uncoop- 
erative and self-serving ways. This trait is characterized by: communicating or im- 
plying cooperation, conveying acceptance by lack of objection, or expressing sup- 
port for another person's idea, but behaving in contradictory ways that serve ones 
self-interest or potentially undermines the efforts of others who are possible threats. 


IV. Interpersonal Factor 


1. Sociability - This is a measure of the tendency to be highly engaged by any so- 
cial situation. This trait is characterized by: being friendly; a desire to be involved 
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in situations with high opportunity for interpersonal interaction; an enjoyment of 
other people's company; and a need to interact with others frequently throughout the 
day. 


2. Consideration - This is a measure of the tendency to express care about other's 
well being. This trait is characterized by: showing concern for others; demonstrat- 
ing compassion, warmth, and sensitivity towards others' feelings and needs; and 
supporting or taking care of others in need. 


3. Empathy - This is a measure of the tendency to understand what others are expe- 
riencing and to convey that understanding to them. This trait is characterized by: a 
desire to listen to, understand, and accept others' problems or opinions; an ability to 
understand the practical and emotional needs of others; an ability to communicate to 
others the understanding of their experiences; an ability to respond to others in a 
way that is non-judgmental and respects them as unique human beings and full con- 
tributors to society; an ability to "feel with" as opposed to “feel for" others; and a 
capacity to identify with others on an emotional level. 


4. Trust - This is a measure of the tendency to believe that most people are good 
and well-intentioned. This trait is characterized by: a belief in the goodness of peo- 
ple; a belief that most people are trustworthy; and not being skeptical or cynical 
about the nature of peoples' intentions and behaviors. 


5. Social Astuteness - This is a measure of the tendency to accurately perceive and 
understand the meaning of social cues and use that information to accomplish a de- 
sired goal. This trait is characterized by: an ability to detect social cues and inter- 
pret how these social cues are related to the underlying motives of other people; a 
desire to understand how others might act based on their intentions, motivations, 
and concerns; and an ability to read and respond to the positions of others in a given 
situation. 


V. Work Orientation 


A. 


Individualism Traits 


1. Independence - This is a measure of the tendency to be autonomous. This trait is 
characterized by: a preference to make decisions without input from others; a pref- 
erence to not be dependent on others; and a desire to not be closely supervised or 
work in an interdependent group or organization. 


2. Competitiveness - This is a measure of the tendency to evaluate one's own per- 
formance in comparison to others. This trait is characterized by: a desire to do 
better than others in many ways; an enjoyment of situations that can lead to a clear 
winner and loser; and a preference for an environment in which people are differen- 
tiated by accomplishments that come at a cost to others. 


3. Risk-Taking - This is a measure of the tendency to take chances based on lim- 
ited information. This trait is characterized by: an enjoyment of situations with un- 
certainty; being entrepreneurial; deriving personal satisfaction from making deci- 
sions based on limited information; and being adventurous. 


4. Desire for Advancement - This is a measure of the tendency to be ambitious in 
the advancement of one's career or position in organizational hierarchy. This trait is 
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characterized by: a desire to get to the top levels of organizational hierarchy; a de- 
termination to succeed in one's chosen career path; a preference for advancement 
potential over job security; and a continual desire to get ahead of where one is cur- 
rently in work and life in general. 


Collectivism Traits 


1. Interdependence — This is a measure of the tendency to work well with others. 
This trait is characterized by: an ability to perform well in groups; a desire to work 
closely with others on shared work; active cooperation with others; a desire to build 
supportive networks of communication; flexible cooperation in conflict resolution 
situations; and a preference to work toward the goals of the group rather than indi- 
vidual goals. 


2. Dutifulness - This is a measure of the tendency to be filled with a sense of moral 
obligations. This trait is characterized by: a desire to do what is right; the practice 
of good business ethics; a desire to meet moral and legal obligations; and an adher- 
ence to a set of commonly held or societal laws. 


3. Responsibility — This is a measure of the tendency to be reliable and dependable. 
This trait is characterized by: a willingness to behave in expected and agreed upon 
ways; following through on assignments and commitments; keeping promises; and 
accepting the consequences of one's own actions. 


VI. Self-Management Factor 


1. Adaptability - This is a measure of the tendency to be open to change and con- 
siderable variety. This trait is characterized by: a willingness to change one's ap- 
proach; being flexible; a willingness to adjust to constraints. multiple demands, and 
adversity; and demonstrating versatility in handling different types of people and 
situations. 


2. Openness - This is a measure of the tendency to accept and respect the individual 
differences of people. This trait is characterized by: an understanding of the 
uniqueness of all people; a desire to understand different cultures, values, opinions, 
and belief systems; a mind set that all people have value; and an openness to the 
possibility that all human differences must not be either bad or good. 


3. Negative Affectivity - This is a measure of the tendency to be generally unsatis- 
fied with many things, including but not limited to work. This trait is characterized 
by: a tendency to be unsatisfied with one's position, organization, pay. and other 
aspects of work; a general negative attitude; and a general dissatisfaction with 
one's life events and surroundings. 


4. Optimism - This is a measure of the tendency to believe that good things are pos- 
sible. This trait is characterized by: showing high spirits in just about any situation; 
being happy, joyful, and excited about things; and demonstrating enthusiasm in 
challenging situations. 


3. Emotional Control - This is a measure of the tendency to be even-tempered. 

This trait is characterized by: the ability to stay calm and collected when confronted 
with adversity, frustration or other difficult situations; an ability to avoid defensive 
reactions or hurt feelings as a result of others’ comments; an ability to be emotion- 
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ally unaffected by external events that one has no control over; and not showing ex- 
treme positive or negative mood swings. 


6. Stress Tolerance - This is a measure of the tendency to endure typically stressful 
situations without undue physical or emotional reaction. This trait is characterized 
by: being free from anxieties; not worrying excessively; demonstrating a relaxed 
approach to stressful situations; and an ability to tolerate stress imposed by other 
people or circumstances. 


7. Self-Confidence - This is a measure of the tendency to believe in one's own 
abilities and skills. This trait is characterized by: a tendency to feel competent in 
several areas; a tendency to demonstrate an attitude that one can succeed in endeav- 
ors; and a belief that one is capable and self-determined. 


8. Impressing - This is a measure of the tendency to try to make а good impression 
on others. This trait is characterized by: a desire to please others; a tendency to tell 
people what they want to hear; the use of flattery and craftiness to manipulate the 
impressions held by others; being cautious not to expose one's true self image; and 
not being frank and forthcoming. 


9. Self-Awareness/Self-Insight - This is the tendency to be aware of one's strengths 
and weaknesses. This trait is characterized by: self-insight into one's motives, 
needs. and values; an ability to avoid self-deception regarding strengths and weak- 
nesses; an understanding of one's limitations; and the tendency to study and under- 
standing one's own behavior. 


VII. Motivation Factor 


1. Energy Level - This is a measure of the tendency to be highly active and ener- 
getic. This trait is characterized by: a need to keep busy doing something at all 
times; a preference for a fast-paced lifestyle; and a tendency to avoid inactive events 
or situations. 


2. Initiative - This is a measure of the tendency to take action in a proactive, rather 
than reactive, manner. This trait is characterized by: a desire to take action where 
others might take a wait-and-see approach; a desire to find ways to get things 
started; a desire to volunteer to take on new responsibilities; and a willingness to 
take on new or additional challenges. 


3. Desire for Achievement - This is a measure of the tendency to have a strong 
drive to realize personally meaningful goals. This trait is characterized by: being 
challenged by difficult goals; being energized by accomplishing goals; a desire to 
work hard to achieve goals; taking satisfaction from doing something difficult; and 
pushing one's self outside of one's comfort zone to achieve a goal. 
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Appendix B: GPI Personality Facets and Sample Items 


Scale Sample Item 
Agreeableness 
Consideration | like to do little things for people to make them feel good. | 
Empathy | take other people's circumstances and feelings into consideration 
before making a decision. 
Interdependence | tend to put group goals first and individual goals second. 
Openness | do not have to share a person's values to work well with that per- 


Thought Agility 


Trust 

Conscientiousness 
Attention to Detail 
Dutifulness 
Responsibility 
Work Focus 


Extroversion 
Adaptability 
Competitiveness 
Desire for Achievement 


Desire for Advancement 
Energy Level 


Influence 

Initiative 

Risk- Taking 

Sociability 

Taking Charge 
Neuroticism 

Emotional Control 


Negative Affectivity 
Optimism 


Self-Confidence 

Stress Tolerance 
Openness to Experience 

Independence 


Innovativeness/Creativity 


Social Astuteness 
Thought Focus 
Vision 

Trait Composites 
Ego-Centered 
Impressing 
Intimidating 


Manipulating 


Micro-Managing 
Passive-Aggressive 
Self-Awareness/Self- 
Insight 


son. 
| think it is vital to consider other perspectives before coming to 
conclusions. 

| believe people are usually honest with me. 


| like to complete every detail of tasks according to the work plans. 

| conduct my business according to a strict set of ethical principles. 

| can be relied on to do what is expected of me. 

| prioritize my work effectively so the most important things get 
done first. 


For me, change is exciting. 

I like to win, even if the activity isn't very important. 

| prefer to set challenging goals, rather than aim for goals | am more 
likely to reach. 

| would like to attain the highest position in an organization some- 
day. 

When most people are exhausted from work, 1 still have energy to 
keep going. 

People come to me for inspiration and direction. 

| am always looking for opportunities to start new projects. 

| am willing to take big risks when there is potential for big returns. 

| find it easy to start up a conversation with strangers. 

| actively take control of situations at work if no one is in charge. 


Even when | am very upset, it is easy for me to control my emo- 
tions. 

| am easily displeased with things at work. 

My enthusiasm for living life to its fullest is apparent to those with 
whom | work. 

| am confident about my skills and abilities. 

| worry about things that | know | should not worry about. 


| tend to work on projects alone, even if others volunteer to help 
me. 

| work best in an environment that allows me to be creative and 
expressive. 

| know what is expected of me in different social situations. 

| quickly make links between causes and effects. 

| can often foresee the outcome of a situation before it unfolds. 


| have often wondered how others would manage without me. 

It is always best to keep important people happy. 

It is sometimes necessary to criticize others openly and publicly for 
their poor performance. 

а can serve as excellent tools for getting what you want or 
need. 

Delegation weakens the power of a leader. 

There are times | say | will cooperate when | know | will not do it. 

| know what motivates me. 


SESS 


The Global Personality Inventory 235 


Appendix C: ESP Scales and Definitions 


1. Thinking Factor 


a. Season Judgment: Applies broad knowledge and seasoned experience when ad- 
dressing complex issues; defines strategic issues clearly despite ambiguity; takes 
all critical information into account when making decisions; makes timely, tough 
decisions. 


b. Visionary Thinking: Has a clear vision for the business or operation; maintains a 


long-term, big-picture view; foresees obstacles and opportunities; generates 
breakthrough ideas. 


c. Financial Acumen: Understands the meaning and implications of key financial in- 
dicators; manages overall financial performance (income statement and balance 
sheet); uses financial analysis to evaluate strategic options and opportunities. 


d. Global Perspective: Keeps abreast of important trends that impact the business or 
organization (technological, competitive, social, economic, etc.); understands the 
position of the organization within a global context. 


2. Strategic Management Factor 


a. Shaping Strategy: Develops distinctive strategies to achieve competitive advan- 
tage; translates broad strategies into specific objectives and action plans; aligns the 
organization to support strategic priorities. 


b. Driving Execution: Assigns clear authority and accountability; directs change 
while maintaining operating effectiveness; integrates efforts across units and func- 
tions; monitors results; tackles problems directly and with dispatch. 


3. Leadership Factor 


a. Attracting and Developing Talent: Attracts high caliber people; develops teams 
and talent with diverse capabilities; accurately appraises the strengths and weak- 
nesses of others; provides constructive feedback; develops successors and talent 
pools. 


b. Empowering Others: Creates a climate that fosters personal investment and excel- 
lence; nurtures commitment to a common vision and shared values; gives people 
opportunity and latitude to grow and achieve; promotes collaboration and team- 
work. 


c. Influencing and Negotiating: Promotes ideas and proposals persuasively; shapes 
stakeholder opinions; projects a positive image; works through conflicts; negoti- 
ates win/win solutions. 


d. Leadership Versatility: Plays a variety of leadership roles (e.g., driving, delegat- 
ing, supporting, coaching) as appropriate; adapts style and approach to match the 
needs of different individuals and teams. 
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4. Interpersonal Factor 


a. Building Organization Relationships: Cultivates an active network of relationships 
inside and outside the organization; relates well to key colleagues (i.e., bosses, 
peers, direct reports); stays in touch with employees at all levels. 


b. Inspiring Trust: Establishes open, candid, trusting relationships; treats all individu- 
als fairly and with respect; behaves in accord with expressed beliefs and commit- 
ments; maintains high standards of integrity. 


5. Communication Factor 


a. Fostering Open Dialogue: Promotes a free flow of information and communication 
throughout the organization (upward, downward, and across); listens actively; en- 
courages open expression of ideas and opinions. 


b. High Impact Delivery: Delivers clear, convincing, and well-organized presenta- 
tions; projects credibility and poise even in highly visible, adversarial situations. 


6. Motivation Factor 


a. Drive For Stakeholder Success: Sets and pursues aggressive goals; drives for re- 
sults; demonstrates a strong commitment to organizational success; works to do 
what is best for all stakeholders (customers, shareholders, employees, etc.). 


b. Entrepreneurial Risk Taking: Champions new ideas and initiatives; identifies new 
business opportunities and makes them a reality; fosters innovation and risk tak- 


ing. 
7. Self-Management Factor 


a. Mature Confidence: Realistically appraises own strengths and weaknesses; shares 
credit and visibility; maintains and projects confidence, even when not supported 
by others. 


b. Adaptability: Maintains a positive outlook, resisting stress and working construc- 
tively under pressure; responds resourcefully to change and ambiguity. 


c. Career and Self-direction: Conveys a clear sense of personal goals and values; 
manages time efficiently; pursues continuous learning and self-development. 


8. Breadth and Depth Factor 


a. Cross-functional Capability: Understands the role and interrelationships of each 
organizational function (e.g.. marketing. sales, operations, finance, human re- 
sources); has experience and skill in managing across functional and organiza- 
tional lines. 


b. Industry Knowledge: Knows what it takes to be successful in this industry; has a 
thorough knowledge of this industry's history. customers, and competitive envi- 
ronment. 


c. Business Situation Versatility: Knows how to size up and meet the challenges of 
different business situations (e.g., start-up. fast growth, steady state, turnaround, 
close-down, merger/acquisition). 


d. Leading Continuous Improvement: Initiates, directs, and sustains efforts to ensure 
continuous learning and improvement throughout the organization. 


Chapter 10 


The Traits Personality Questionnaire (TPQue) 


loannis Tsaousis 


Introduction 


The Traits Personality Questionnaire (TPQue) is a comprehensive measure in Greek 
language of the five major dimensions of personality and of the most important traits 
that define each of them. The constructs of the TPQue were formed according to the 
content and structure of the NEO-PI-R of Costa and McCrae (1992). We considered 
the NEO-PI-R as a more adequate instrument for the development of a comprehen- 
sive personality trait inventory in comparison to other tests and models of trait per- 
sonality theory, including the work of Eysenck, Cattell, Guilford. We took this 
model as starting-point not only because of its comprehensiveness but also because 
of its being so well documented. According to Costa and McCrae (1992) their model 
comprises five main domains that "...give a quick grasp of the major features", and 
30 facets that "...allow more detailed analysis of the particular forms in which these 
major domains are expressed" (p. 39). Under this perspective, it could be argued that 
there is a direct link between the NEO-PI-R and the TPQue, since the conceptual 
definitions provided by Costa and McCrae (1992), for their NEO-PI-R, constitute 
the theoretical framework of the five factor scales and the 30 sub-scales of the 
TPQue: 


The development of the TPQue 


Two main approaches have been adopted to construct the factor scales and the factor 
sub-scales that constitute the TPQue: a deductive as well as an inductive technique. 
Following the deductive approach, questionnaire items were written to reflect the a 
priori model of personality. Following this principal, all the items in the TPQue 
were written to reflect behavior, habitual responses, and trait dispositions of the five- 
factor model of personality. Following the inductive approach, data were collected 
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and reduced to establish the simplest, most parsimonious statistical solution. For this 
purpose, exploratory as well as confirmatory factor analyses were used; these meth- 
ods were expected to provide a brief and comprehensive factor structure model. 

The developmental stages of the TPQue included three phases: the first phase in- 
cluded the generation of the items, the pilot study and the item analysis; the second 
phase included the psychometric evaluation (reliability and validity) of the newly 
developed questionnaire, and finally, the third phase included the standardization 
procedure (development of norms). 


Item generation 


The first step was the development of the items. During this stage we focused on 
determining the factor scales and sub-scales of the Concept Model and on writing 
appropriate items to measure them. Since the main objective of the present ques- 
tionnaire was to tap the specific ethnic and cultural characteristics of the Greek 
population as they are sedimented in the Greek language, the adaptation of the con- 
ceptual definitions provided in the NEO-PI-R could not directly be used as a depar- 
ture point for the development of items. For this reason, a taxonomic study was car- 
ried out in order to identify Greek adjectives that conceptualize the factor scales and 
sub-scales of the Greek instrument. 

For each of the 30 NEO-PI-R sub-scales that constitute the theoretical framework 
of the Greek questionnaire, positive and negative definitive adjectives were identi- 
fied in a Greek Thesaurus (Sakellariou, 1991). The following procedure was fol- 
lowed: first, the main adjective for each sub-scale (the corresponding translation of 
Costa & McCrae's facet name) was established (e.g., Warm, Vulnerable, Modest) 
and their synonyms (e.g., friendly, sensitive, humble) and antonyms (e.g., aloof, 
hard, boastful) were identified; subsequently, synonyms and antonyms of those ear- 
lier synonyms and antonyms were collected, and so on until a sufficient number of 
adjectives in both categories (positive and negative) was established. 

These clusters of synonyms and antonyms (approximately 15-20 per sub-scale) 
formed the basic item pool, from which items for each sub-scale were written. A 
total of 600 items were written that might be used to measure the 30 sub-scales and 
the 5 factor scales of the TPQue, which was at least one statement for each adjective. 
Following certain exclusion criteria (ambiguity, negations, idiomatic expressions, 
suggestive formulations, sexist or ethnocentric formulations, etc.) the best 360 items 
were selected. To investigate whether each of the selected items reflected the con- 
ceptual definition of each sub-scale, ratings from 20 judges were obtained, using a 
5-point rating scale ranging from 1 (not representative at all) to 5 (very representa- 
tive). The results revealed that only 6 of the 360 selected items were not representa- 
tive of the construct they intended to measure. Although the results suggested that 
these items should be eliminated, we decided to keep them in the pilot study to see 
how they would react in the field test. As expected, all six items were eliminated in 
subsequent construction stages. 
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Apart from the resulting 360 items, the initial pool of TPQue items contained an 
additional 30 items intended to measure lying and social desirability responses. The 
two scales each comprised 15 items which were mainly elicited from corresponding 
scales of well-known inventories such as MMPI-L and -K scales (Hathaway & 
McKinley, 1983), EPQ-L scale (Eysenck & Eysenck, 1975), Edwards Social Desir- 
ability scale (Edwards, 1957), etc. At the end of this stage, the total number com- 
posing the initial pool of TPQue items was 390. 


Pilot study 


The 390 items were used in a pilot-study in order to reduce the number and to create 
the final version of the TPQue. The subject sample consisted of 138 students (17-32 
years of age), recruited from various universities in Greece. Participants were asked 
to indicate to what extent items apply to self, using a 4-point ‘Likert’ scale form, 
ranging from 0 (strongly disagree) to 3 (strongly agree). It was decided that the neu- 
tral option should be left out at this stage, in order to force individuals to choose one 
of the other remaining categories, hence eliminating the effect of central tendency. 


Item analysis 


The first step in item selection was the development of a marker set of items for each 
sub-scale and each factor scale separately. Markers form a core cluster of items 
which is closely related to other items for the scale, but not closely related to items 
for other scales or sub-scales. The advantage of using a marker set in this initial 
stage of item selection is that overlap between items from various sub-scales is con- 
trolled, and also that relative independence between factor scales is ensured, this 
being one of the assumptions of the conceptual model. 

Three stages can be distinguished in the development of each set of sub-scale 
markers: First, all the items relating to each sub-scale (12 items per sub-scale) were 
collected and their mean scores from the pilot study estimated. Second, using Prin- 
cipal Component Analysis, all the items with mean scores between 1 and 2 were 
factor analyzed. Finally, four items (two with the highest positive and two with the 
highest negative loading, to control for the acquiescence effect) from the generated 
factor analytic solutions were chosen, and a marker set of items for each of the 30 
sub-scales was composed. 

The next step was to develop factor markers for each of the five factor scales, 
following the same procedure. First, all items that had been used in the sub-scale 
markers for each factor (6 sub-scales x 4 items = 24 items) were collected (forming 
an initial item pool for each factor scale), and factor analyzed (Principal Compo- 
nents). Then, two items (one with the highest positive and one with the highest 
negative loading) from each sub-scale marker were selected, generating a factor 
marker scale of 12 equally balanced positive and negative items for each factor scale 
(6 sub-scales x 2 items =12 items). 
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Subsequently, each factor marker scale was correlated with the remaining factor 
marker scales, together with variables such as the L scale, SDR scale, sex, and age, 
in order to investigate the relationship of each scale with its own items, and to elimi- 
nate items that were highly correlated with other scales. Additionally, in order to 
improve the cohesiveness of each factor marker scale, the two items with the lowest 
correlations (one positive and one negative) were identified and replaced with the 
two items with the highest correlations, which were elicited from the remaining 12 
items rejected from the initial item pool of each factor marker scale. 

In the final step, each item was correlated with every scale, and each scale was 
correlated with every other scale within the same factor, as well as with scales from 
different factors. Items were only selected if they were highly correlated with the 
scale under construction, and of low correlation with the other scales or sub-scales. 


Description of the TPQue 


The TPQue consists of 180 items measuring the five broad dimensions of personal- 
ity, and thirty specific sub-scales, which correspond to the most influential traits of 
the five domains. Each factor scale consists of 36 items, including 6 items per sub- 
scale. Additionally, it contains two independent scales (consisting of 13 items each) 
measuring lying and social desirability responses (Tsaousis, 1996), giving a total 
number of 206 items. Item responses are recorded on a 5-point scale ranging from 
strongly disagree (1) to strongly agree (5), indicating to what extent individuals 
agree or disagree with the content of the items; the estimated completion time is 
approximately 30 minutes. 

The most important characteristics that define each of the TPQue factor scales, 
are presented next. Table 1 presents examples of items for each of the 30 TPQue 
sub-scales. 


Extraversion 


People who score high on this TPQue factor scale are sociable, like going to social 
events such as parties, etc., and generally feel very comfortable when they are 
amongst others. They like talking a lot, are optimists, active, and like facing new 
adventures, and, generally speaking, are considered as warm and enjoyable people. 
People who tend to score low on this scale are reserved (without meaning un- 
friendly), independent (rather than followers), and evenpaced (rather than sluggish). 
They are usually shy when meeting new people, but this is not an indication that 
they suffer from social anxiety. Finally, they are not as enthusiastic as extraverts are, 
which, however, does not mean that they are pessimistic or unhappy. 
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Table 1. Examples of items for TPQue sub-scales 


Sub-Scales 


Items 


Extraversion Sub-scales 


Warmth (E1) 
Gregariousness (E2) 
Assertiveness (E3) 


Activity (E4) 

Excitement Seeking (E5) 

Positive Emotions (E6) 
Neuroticism Sub-scales 


Anxiety (N1) 


Angry Hostility (N2) 
Depression (N3) 
Self - Consciousness (N4) 


Impulsiveness (№5) 
Vulnerability (N6) 


Openness Sub-Scales 


Fantasy (01) 
Aesthetics (02) 
Feelings (03) 


Actions (O4) 
Ideas (05) 
Values (O6) 


Agreeableness Sub-Scales 


Trust (A1) 
Straightforwardness (A2) 


Altruism (A3) 
Compliance (A4) 
Modesty (A5) 
Tender-Mindedness (A6) 


Conscientiousness Sub-Scales 
Competence (C1) 
Order (C2) 
Dutifulness (C3) 


Achievement Striving (C4) 
Self - Discipline (C5) 


| usually get involved emotionally in my friends’ problems 

| do not like going to parties (В) 

Very often | take on the responsibility of organizing the activi- 
ties of my company 

| consider myself an active and energetic person 

Usually, | try to avoid daring situations (В) 

| consider myself an optimistic person 


Many people think of me as a person who does not feel afraid 
easily". 

Quite often | get mad with others 

| think that | feel sad more often than other people до 

| have no problem going into a class that is full of people, who 
have already started a discussion 

| believe that | am a person who can control their emotions (В) 
Sometimes | feel so helpless, that | ask someone else to help 
me 


| consider myself a person with a rich, active imagination 
Reading literature bores me (R) 

Sometimes ! feel guilty about things that happen around me, 
about which | do nothing to change (e.g. poverty, misery, etc.) 
| like to taste traditional dishes from various countries (R) 

| think of my self as open minded 

| am among those who believe that there is not always only one 
truth 


Most of the people | know are good and honest 

Flattering people is a good way of asking them to do what you 
want them to (R) 

When somebody needs me, | always help them 

| consider my self as competitive person (R) 

| prefer not speaking about myself 

| find it essential to be aware of social policy issues 


Sometimes, | feel completely useless (R) 

| find a well organized life-style with pre-scheduled activities 
fits my personality perfectly 

| usually avoid giving promises, because | know that | rarely 
keep them (R) 

| like to put goals in my life, and work hard to achieve them 
When 1 am dealing with a task, | concentrate on it until | finish 


Deliberation (C6) Very often people tell me that | am frivolous 


Note: items marked with (R) are reversely scored 
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Neuroticism 


High scorers on this factor scale are usually people that live stressful lives, who are 
very anxious about all their dealings with day-to-day life, and generally worry about 
matters too much; they have the tendency to experience fear more frequently than 
other people do, are unable to resist their cravings or urges, and are vulnerable, and 
because of this, they experience depressive emotions and are very sensitive to ridi- 
cule. People who score low on this factor scale are usually characterized by emo- 
tional stability; they are relaxed most of the time, rarely get upset, and under stress- 
ful or dangerous situations they keep their nerves calm; they do not worry about 
things that are going to happen in the future, and generally feel secure and self- 
satisfied. 


Openness to experience 


People who score high on this scale are open-minded, look forward to learning and 
discovering new things, and see every new experience as a challenge to their abili- 
ties. They have a very active imagination, and frequently use it to escape from real- 
ity. They appreciate art as an aspect that enriches their inner world, and experience 
both happiness and sadness with very strong emotions. Low scorers on this factor 
scale do not like changes, prefer the old and well-established ways of doing things, 
and have more conservative ideas than high scorers; they are not interested in art, 
and consider people with an active imagination immature; they prefer to keep their 
feet on the ground, and their emotional responses are usually muted. 


Agreeableness 


Individuals who score high on this scale are people who usually care about others 
and try to help all in need; they are modest and inclined to forgive easily even when 
they have been hurt, and they trust everyone, since they believe that most people 
have good intentions; they are straightforward and sincere, and if something annoys 
them, they prefer to discuss it directly with the person responsible, rather than ac- 
cuse him/her behind his/her back. People who score low on this scale are egocentric 
in their behavior, highly antagonistic rather than cooperative with their colleagues or 
friends, and most of the time they are skeptical of others' intentions; they like to 
manipulate others for their own gains, and consider flattering as a skill that helps in 
achieving their goals; they rarely forget someone who has hurt or blamed them, and 
they can sometimes be very cruel or cynical. 
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Conscientiousness 


High scorers on this scale are confident about their capabilities, are dutiful, and 
strictly adhere to their ethical principles and moral obligations; order plays an es- 
sential role in their life, and as such they are very well organized, and they think 
very carefully before acting; they tend to control their desires or impulses, whilst 
highly motivated to get the job done; they are ambitious, make plans and place 
goals, and work hard to achieve them. People who score low on this scale do not 
have self-control over their behavior, desire or disagreements; they rarely keep their 
promises, hate order, and prefer not making future plans for their life; many people 
consider them as unreliable, lazy, and lax. 


Reliability of the TPQue 


To investigate the TPQue’s reliability, the indices of internal consistency and test- 
retest reliability were estimated. Furthermore, in order to investigate the stability of 
each scale over time, the differences of the mean scores obtained from two separate 
administrations were compared, using the т criterion for related samples. 


Internal consistency 


Cronbach's alphas were computed for each factor scale and sub-scale of the TPQue. 
Table 2 gives coefficient alphas for the factor scales and sub-scales along with their 
standard error of measurement (Sem). 

The TPQue factor scales have coefficient alphas ranging from .78 to .89. Most of 
the individual sub-scales have internal consistencies ranging from .51 to .80, which 
are acceptable for scales with only six items (Cattell & Kline, 1977). However, six 
sub-scales (E1, N5, 05, A2, A3, and Аб) have lower alphas (ranging from .34 to 
48), suggesting a broader mix of items or lower homogeneity. 


Test-retest reliability 


Test-retest data were collected from a sample of 125 individuals, who were tested 
over a time interval of 4 weeks. As shown in Table 3, the test-retest correlations for 
factor scales ranged from .89 to .95 (p « .01), and reliability correlations for the sub- 
scales ranged from .72 to .91 (p « .01). In both cases, the results indicate the stability 
of all the TPQue factor scales and sub-scales. 
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Table 2. Internal consistency for TPQue factor scales and sub-scales (N - 1,054) 


p I-—-—— ÁÁ— 0 NN" ОБР ИЕ à SEM X 
Factor Scales 
Extraversion (E) .88 122.40 17.16 5.94 3.47 
Neuroticism (N) .89 112225 18.97 6.29 3.32 
Openness (О) ‚83 126.48 15.70 6.47 4.12 
Agreeableness (А) ‚78 118.44 13.07 6.13 . 4.69 
Conscientiousness (C) .88 115.07 17.43 6.04 3.47 
Sub-Scales 
Warmth (E1) 48 20.57 3.40 2.45 9.52 
Gregariousness (E2) .68 21.08 4.13 2.34 5.66 
Assertiveness (E3) S 19.19 4.16 2.24 5.39 
Activity (E4) .65 19.98 3372 2.20 5.19 
Excitement Seeking (E5) .62 20.32 3.65 2.25 6.16 
Positive Emotions (E6) .62 21.26 3.84 2.37 6.16 
Anxiety (N1) .75 19.94 4.66 2.33 5.00 
Angry Hostility (N2) .75 19.36 4.61 2.30 5.00 
Depression (N3) 178 18.93 4.39 2.28 5.20 
Self-Consciousness (№) .66 17.96 4.34 2.53 5.83 
Impulsiveness (№5) 137. 19.36 3.38 2.69 7.94 
Vulnerability (N6) .79 16.69 4.93 2.26 4.58 
Fantasy (01) ‚70 22.39 4.15 2.27 5.48 
Aesthetics (02) .80 22.10 4.47 2.12 4.47 
Feelings (03) .51 21.36 3.57 2.50 7.00 
Actions (O4) .59 20.46 3.89 2.49 6.40 
Ideas (05) .34 19.80 3.41 2277 8.12 
Values (06) .69 20.10 4.50 2:51 5.57 
Trust (A1) 57 20.85 3.37 2.21 6.56 
Straightforwardness (A2) 37 18.60 3.34 2.65 7.94 
Altruism (A3) .45 19.69 3.06 2.27 7.42 
Compliance (A4) .64 18.70 4.00 2.40 6.00 
Modesty (A5) .63 19.20 3.97 2.41 6.08 
Tender-Mindedness (A6) .47 21.40 3.17 2.31 7.28 
Competence (C1) .61 20.16 3.64 2.27 6.24 
Order (C2) .68 17.89 4.82 2.73 5.66 
Dutifulness (C3) .53 18.85 3.63 2.49 6.86 
Achievement Striving (C4) .61 20.22 3.76 2.34 6.24 
Self-Discipline (C5) .59 19.03 3.70 2.37 6.40 
Deliberation (C6) .68 18.91 4.05 2.29 5.66 


Note: SEM (raw) = Standard error of measurement for raw data, SEM (T) = Standard error of meas- 
urement for T scores. 


Factor Scales 
Extraversion (E) 
Neuroticism (N) 
Openness (О) 
Agreeableness (A) 
Conscientiousness (C) 


Sub-Scales 
Warmth (E1) 
Gregariousness (E2) 
Assertiveness (E3) 
Activity (E4) 
Excitement Seeking (E5) 
Positive Emotions (E6) 


Anxiety (N1) 

Angry Hostility (N2) 
Depression (N3) 
Self-Consciousness (N4) 
Impulsiveness (N5) 
Vulnerability (N6) 


Fantasy (O1) 
Aesthetics (02) 
Feelings (O3) 
Actions (O4) 
Ideas (05) 
Values (O6) 


Trust (A1) 
Straightforwardness (A2) 
Altruism (A3) 
Compliance (A4) 
Modesty (A5) 
Tender-Mindedness (A6) 


Competence (C1) 
Order (C2) 
Dutifulness (C3) 


Achievement Striving (C4) 


Self-Discipline (C5) 
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Table 3. Test-retest reliability coefficients for TPQue factor scales and sub-scales (N = 125) 

1st Testing 2nd Testing Test-retest SEM 

M SD M SD Reliability Raw Т 
122.40 17.16 122.87 16.39 .91 5.15 3:00 
112.25 18.97 107.10 19.34 ‚95 4.24 2.24 
126.48 15.70 130586 15.77 ‚93 4.15 2.65 
118.44 13.07 117.58 12.83 .89 4.33 3.32 
115.07 17.43 113.89 16.80 .93 4.61 2.65 
20.57 3.40 21.03 2.87 .72 1.80 5.29 
21.08 4.13 21.02 3.79 .83 1.70 4.12 
19.19 4.16 19.40 4.14 .89 1.38 3.32 
19.98 3.72 19.51 3.60 .85 1.44 3.87 
20.32 3.65 20.20 3.38 .85 1.41 3.87 
21.26 3.84 21.70 3.78 .87 1.38 3.61 
19.94 4.66 18.78 4.74 .89 1,55 3.32 
19.36 4.61 18.39 4.60 .87 1.66 3.61 
18.93 4.39 17255 4.44 .86 1.64 3.74 
17.96 4.34 16.62 4.20 .86 1.62 3.74 
19.36 3.38 19.49 3.10 .79 1.55 4.58 
16.69 4.93 16.27 4.98 .91" 1.48 3.00 
22.39 4.15 22.87 3.96 .84 2.62 4.00 
22.10 4.74 23.48 3.69 .84 2.99 4.00 
21.63 8:57 22.25 3.28 3732 1.86 5.20 
20.46 3.89 20.59 3.95 .85 12510883: 87 
19.80 3.41 20.08 3.58 .80 1.52 4.47 
20.10 4.50 21.10 4.68 .91 1.35 3.00 
20.85 3.37 21.52 3.22 athe 1.54 4.58 
18.60 3.34 18.67 3.04 ‚80 1.49 4.47 
19.69 3.06 18.94 2.94 ‚78° 1.44 4.69 
18.70 4.00 18.84 3.91 .83 1.65 4.12 
19.20 3.97 18.74 3.70 .80 1.77 4.47 
21.40 3.17 20.87 2.81 sur 1.52 4.80 
20.16 3.64 20.29 3.54 ‚82 1.54 4.24 
17.89 4.82 18.04 4.43 .90 1.52 3.16 
18.85 3.63 18.30 3.47 81° 1.58 4.36 
20.22 3.76 19.93 3.59 .85 1.46 3.87 
19.03 3.70 18.76 3.55 .84 2.34 4.00 
18.91 4.05 18.57 3.95 .86 1.52 3.74 


Deliberation (C6) 


Note: 1st Testing = first administration, 2nd Testing = second administration after 4 weeks, SEM 
(raw) = Standard error of measurement for raw data, SEM (T) = Standard error of measurement for 
T scores, * = significant differences between scores from the two administrations. 
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Table 4. Inter-correlations among Factor Scales of the TPQue 


N 0 A C 
E E 7.35 ** 39 :07* 30* 
N j -.07 ** .03 -.38 ** 
о „19 -.04 


“17 25 


Note: E = Extraversion, М = Neuroticism, О = Openness to Experience, А = Agreeableness, С = 
Conscientiousness; p « 0.05; ** p « 0.01 


Finally, when the stability of the mean scores of the five scales was investigated, 
all factor scales, showed remarkable stability over time, since the mean scores from 
the two administrations did not differ significantly. When the sub-scales' mean 
scores from the two administrations were tested, only 6 sub-scales (N6, O3, A1, A3, 
A6, and C3) were found to differ significantly (Table 3). 


Relationships among the TPQue factor scales 


The degree of overlap between factor scales was an obvious consideration to be 
dealt with during the construction stages of the TPQue, as was the good internal 
consistency and the psychological meaningfulness of all scales. As mentioned ear- 
lier, in the development procedure we tried to build up relatively independent factor 
scales out of items keyed to one and only one scale, in order to avoid artifactual cor- 
relations. This was successful to some extent. No factor scale was highly correlated 
with any another. However, in some cases, where the relativeness of the concepts 
was very close, there was some overlap. Table 4 shows the inter-correlations among 
TPQue factor scales. 


Validity of the TPQue 


There are several methods of exploring validity, and TPQue provides evidence to 
support most of them. More specifically, there is evidence available to support con- 
tent validity, construct validity and factorial validity of the measure. 


Content validity 


According to Haynes, Richard, and Kubany (1995), content validity is the degree to 
which elements of an assessment instrument are relevant to and representative of the 
targeted construct for a particular assessment purpose. The relevance of a measure 
refers to the appropriateness of its elements for the targeted construct and function of 
assessment (Ebel & Frisbie, 1991; Guion, 1977; Messick, 1993). The representa- 
tiveness of an instrument refers to the degree to which its elements are proportional 
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to the facets of the targeted construct (Lynn, 1986; Nunnally & Bernstein, 1994; 
Suen & Ary, 1989). TPQue demonstrates its content validity by providing evidence 
for its two basic components: representativeness and relevance. 

In Costa and McCrae's (1992) NEO-PI-R, representativeness is addressed by 
identifying six distinct facets to sample each domain and by selecting non-redundant 
items to measure each facet. Since TPQue uses the same hierarchical organization as 
in the NEO-PI-R (Kline, 1993), the representativeness of its sub-scales is also as- 
sumed. In terms of the relevance of the items in measuring the targeted constructs, 
independent judges were used in order to decide whether the written items reflect 
what they are trying to measure. 

For this purpose, 20 judges were used to evaluate the relativeness of the selected 
items to the targeted constructs, using a 5-point scale (1: not representative at all, 5: 
very representative). Then, the mean score for each item was estimated, and the 
items with a mean score above 3.5 were selected. Finally, all 180 items were found 
to meet the specified cut off point, suggesting that TPQue items are highly related to 
the constructs they measure. 


Convergent and discriminant validity of the TPQue 


A usual step in scale development involves the concurrent validation process in 
which the new scale is correlated with other scales that are posited to tap similar 
processes. More specifically, TPQue factor scales were correlated with the Eysenck 
Personality Questionnaire — EPQ (Eysenck & Eysenck, 1975; Demetriou, 1986), 
the 16 Personality Factors Questionnaire, Form C — 16PF-C (Cattell, Eber, & 
Tatsuoka, 1970), Observer ratings for Openness, Agreeableness, and Conscientious- 
ness Scales, the Minnesota Multiphasic Personality Inventory — MMPI (Hathaway 
& McKinley, 1983), Self Directed Search — SDS (Holland, Fritzsche, & Powell, 
1985), Job Satisfaction Scale — JSS (Warr, Cook, & Wall, 1979), and Organiza- 
tional Culture Inventory — OCI (Cooke & Lafferty, 1989). Table 5 presents evi- 
dence for convergent and discriminant validity of the factor scales of the TPQue. 


The TPQue and Eysenck's EPQ 


At the top of Table 5 correlations between TPQue and EPQ scales are provided. As 
can be seen from the results, the TPQue Neuroticism (N) and Extraversion (E) factor 
scales were correlated highly with the corresponding EPQ N and E dimensions. Ad- 
ditionally, the TPQue Agreeableness (A) and Conscientiousness (C) factor scales 
were correlated negatively with the EPQ Psychoticism (P) scale, a result which is 
consistent with Eysenck's argument that P is a blend of low C and A (Eysenck, 
1992); 
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Table 5. Correlation coefficients of the TPQue Factor Scales with various criterion scales 


“Criterion Scales” ^ ^ — — TPQuet  TPQueN TPQueO  TPQueA TPQueC _ 
EPQ (N - 88) 
Extraversion .82* -.35* 32° 212 .19 
Neuroticism -.36* .69* -.19 -.08 13, 
Psychoticism .10 -.06 -.10 -.44* -.28 
16РЕ (М = 83) 
Self- Reliance (Q2) -,40"* .09 -.15 .04 -.02 
Tension (Q4) -.10 :99** .04 -.10 -.12 
Openness to Change (Q1) 224 -.22 .49** .01 ‚11 
Perfectionism (Q3) -.12 -.40** 250^ 23: 5375 
Observer ratings (М = 86) 
Openness 42" .14 .08 
Agreeableness .14 32935 .26 
Conscientiousness .08 ‚07 oon 
MMPI (N = 76) 
Hypochondriasis (Hs) | -.42** .39** -.24 .03 -.07 
Depression (D) -.51** суда -.40** .06 -.15 
Hysteria (Hy) -.44** BEAT -.31* -.02 -.07 
Psychopathic Deviate (Pd) -.11 :39** -.02 -.02 -.02 
Masculinity-Femininity (Mf) -.14 .16 .22 Hy .10 
Paranoia (Pa) -.35** .64** -.15 -.01 -.30* 
Psychasthenia (Pt) -.44** 757 -.28* -.14 = 455° 
Schizophrenia (Sc) -.48'* .69** -.25 -.15 =. Dilan 
Hypomania (Ма) .07 :362* su -.13 -.28* 
Social Introversion-Extroversion (Si) -.56 :6155 -.31* .05 -.30* 
SDS (М = 152) 
Realistic 234 -.25** .004 -.06 .10 
Investigative .16* -.28** .24** -.05 22396 
Artistic .03 ‚11 .40** -.03 -.12 
Social .20* .10 221^ .14 .07 
Enterprising .43** -.24“ -.18* -.28** 5272. 
Conventional sure -.14 -.18* -.20* FA ga 
JSS (N = 222) .14* -.24“ “15° -.04 Ser 
ОС! (М = 157) 
Humanistic/Helpful 17 -.17 -.07 .44** 25° 
Affiliation .10 -.13 -.09 .39** 222“ 
Achievement .01 -.08 -.18 -.02 „21° 
Self-Actualization -.16 -.02 -.05 -.22* -.03 
Approval -.12 .06 -.01 -.10 .06 
Conventionality -.02 .03 -.10 -.20* -.13 
Dependence .01 .06 -.09 .04 .03 
Avoidance -.02 .06 -.07 22272 -.05 
Oppositional -.05 -.01 -.06 -.29** .03 
Ромег .01 -.01 -.09 -.07 .19* 
Competitive ais -.28"* .01 .29** 25103-9 
Perfectionism .01 -.04 -.11 .20* 23028 


Note: EPQ - Eysenck Personality Questionnaire, 16PF - 16 Personality Factors Questionnaire, MMPI 
= Minnesota Multiphasic Personality Inventory, 505 = Self Directed Search, JSS = Job Satisfaction 
Scale, OCI - Organisational Culture Inventory; * p « 0.05; ** p « 0.01 
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The TPQue and Cattell's 16PF 


Table 5 also provides correlations between the TPQue factor scales and the 16PF 
second order scales. The correlations were in the direction and generally of the mag- 
nitude predicted. In particular, the TPQue N, O, and C factor scales were positively 
correlated with Tension (Q4), Openness to Change (01), and Perfectionism (Q3) 
scales, respectively, while the TPQue E factor scale was negatively correlated with 
the Self-Reliance (Q2) scale. 


The TPQue and Observers ratings 


Another study involved the correlation of three of the TPQue factor scales — Open- 
ness (O), Conscientiousness (C), and Agreeableness (A) — with corresponding rat- 
ings obtained from observers (friends and relatives), who were called to give ratings 
for the people who had previously completed the TPQue. As shown in Table 5, all 
factor scales were positively correlated with the corresponding observers' ratings, 
and two of them, the O and A factor scales, were substantial in magnitude. 


The TPQue and MMPI 


In another study, the TPQue factor scales were correlated with MMPI scales. As can 
be seen from the results presented in Table 5, the TPQue N factor scale was corre- 
lated positively with almost all MMPI scales (except the Masculinity/Femininity 
scale). This result 15 consistent with the theory that people prone to experience 
negative emotions show higher levels of psychopathological symptoms (Avia, San- 
chez-Bernardos, Martinez-Arias, Silva, & Grana, 1995). Additionally, as expected, 
many of the MMPI clinical scales were negatively correlated with the TPQue E 
factor scale, a result which is consistent with the idea of Extraversion as a dimension 
of positive emotions (Wiggins & Pincus, 1989). The remaining TPQue factor scales 
showed specific relations with MMPI clinical scales, consistent with theoretical pre- 
dictions. For example, the TPQue O factor scale was correlated negatively with 
MMPI Depression (D), Hysteria (Hy), Psychasthenia (Pt) and Social Introversion- 
Extroversion (Si) scales, while the TPQue C factor scale was correlated negatively 
with Paranoia (Pa), Psychasthenia (Pt), Schizophrenia (Sc), Hypomania (Hy), and 
Introversion-Extroversion (Si) scales. All the above findings are consistent with pre- 
vious research comparing scores on these measures (Avia et al., 1995; Costa, Busch, 
Zonderman, & McCrae, 1986; Costa & McCrae, 1990). 


The TPQue and Holland's SDS 


Correlations of the TPQue factor scales with Holland's RIASEC vocational interest 
model are also presented in Table 5. Consistent with our expectations, the TPQue E 
factor scale was correlated positively with Enterprising and Social types, while the 
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TPQue N factor scale was correlated negatively with Investigative, Realistic, Social, 
and Conventional types. Furthermore, the TPQue O factor scale was mainly posi- 
tively correlated with the Artistic type, as predicted, while the TPQue A factor scale 
was correlated negatively with both Enterprising and Conventional types. Finally, 
the TPQue C factor scale was correlated positively with the Investigative, Enter- 
prising, and Conventional types. These findings are consistent with the existing lit- 
erature (Costa, McCrae, & Holland, 1984; De Fruyt & Mervielde, 1997; Tokar & 
Swanson, 1995). 


The TPQue and JSS 


In Table 5, correlations between TPQue factor scales and JSS are also provided. As 
expected, the TPQue E and C factor scales were correlated positively with job satis- 
faction (Diener, 1996; Robertson, Baron, Gibbons, MacIver, & Nyfield, 2000; Sal- 
gado, 1997), while the TPQue N and O factor scales were correlated negatively 
(Nikolaou, in press) with the same construct. 


The TPQue and ОС! 


The TPQue factor scales were also correlated with OCI scales. There are numerous 
significant associations between the TPQue personality dimensions and concepts 
measured by the OCI. For example, the TPQue A factor scale was correlated posi- 
tively with the Humanistic/Helpful and Affiliation scale, and negatively with Oppo- 
sitional and Avoidance scales, as expected. Similarly the TPQue C factor scale was 
found to be positively correlated with Perfectionism, Competitive, Achievement, 
and Power scales, amongst others. Most of the other correlations were interpretable 
within the Big Five context. 

Additionally, almost all 30 convergent correlations between the factors and the 
corresponding sub-scale scores ranged from .44 (N3) to .96 (O2), with a median of 
71. Only two sub-scales appear to have very low correlations with their factor 
scores; these are АЗ and СІ. By contrast, the majority of the 150 discriminant cor- 
relations (93.3 96) were below .38. 


Factor structure 


Since TPQue is intended to represent the five-factor model of personality, one obvi- 
ous test of its adequacy is how well its internal structure corresponds to the predic- 
tions of the model. There are at least two levels at which the factor structure of the 
TPQue can be examined: at the item level and at the sub-scale level. 

The TPQue consists of 180 items that define 30 six-item sub-scales grouped into 
five-factor scales. Would an item factor analysis recover these scales? When five 
varimax-rotated principal components were extracted, the majority of them corre- 
sponded to the hypothesized factors. A total of 144 of the 180 items (80 %) had 
their largest loading on the intended factor. For the 36-item factor scales, 32 E items, 
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Table 6. Factor structure of the TPQue Factor Scales and sub-scales and congruences for Factor 
scales and sub-scales after Procrustes rotation to the normative American Structure 


c0 oo ————— —-————— —————-——————D MEN 


TPQue sub-scales N E о А С сое 
Extraversion sub-scales 
Warmth -.02 .64 .28 .51 .01 .97** 
Gregariousness -.24 ‚72 -.01 -.07 ‚04 .97** 
Assertiveness -.32 :53 .28 -.29 .34 .99** 
Activity -.17 .62 .19 “17 .26 19325 
Excitement Seeking -.20 .65 ‚27 -.17 ‚19 .84 
Positive Emotions 572] .72 .20 -.13 .05 19325 
Neuroticism sub-scales 
Anxiety .87 -.01 -.06 -.00 .02 .99** 
Angry Hostility .67 .18 .08 -.42 -.00 „95°* 
Depression .81 -.18 -.04 .10 -.14 19725 
Self Consciousness .53 -.38 -.26 .29 -.17 .86* 
Impulsiveness .40 .39 .04 -.17 -.35 „99** 
Vulnerability .79 -.10 -.15 .05 -.29 .98** 
Openness sub-scales 
Fantasy M3 .08 ‚61 sir -.16 .90* 
Aesthetics .04 .06 .66 .20 .16 .99** 
Feelings .26 .29 252 .34 -.01 .85 
Actions -.11 ‚31 287, -.07 .03 .96** 
Ideas -.08 .19 .68 -.13 .09 .95** 
Values -.15 -.11 .65 -.10 -.20 957 
Agreeableness sub-scales 
Trust -.08 .43 -.11 .54 -.04 .81 
Straightforwardness .00 .13 .26 .33 .16 ‚62 
Altruism ali .38 .01 .56 .10 .94** 
Compliance -.02 -.09 .04 .78 -.11 ‚97** 
Modesty 305 -.20 .02 .56 .05 .92* 
Tender-Mindedness ‚19 ‚32 217. .58 .28 .91* 
Conscientiousness sub-scales 
Competence -.38 „13 ‚01 .02 .70 .98** 
Order -.03 -.08 -.20 ‚14 57 .96** 
Dutifulness .-09 -.03 .00 127. .74 .99** 
Achievement Striving -.04 23 .01 -.03 ‚76 .97** 
Self-Discipline -.17 .06 .06 .08 .82 .96** 
Deliberation -.21 -.24 -.04 ‚14 ‚66 .99** 
Factor Congruencies .95** .94'* 292 MEOS 29525 "9322 


Note: М = Neuroticism, E = Extraversion, О = Openness to Experience, А = Agreeableness, С = 
Conscientiousness; * Congruence higher than that of 95% rotations from random data; ** Congru- 
ence higher than that of 99 % rotations from random data. 


28 N items, 25 O items, 24 A items, and 33 C items had their highest loading on the 


intended factor. | 
Furthermore, to determine the extent to which the five-factor model emerged 


from the sample, a principal components analysis and an orthogonal Procrustes ro- 
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tation were performed on the 30 TPQue sub-scales. The loadings on the five factors, 
which accounted for 56.33 per cent of the variance, are shown in Table 6. 

As shown in Table 6 each sub-scale has its highest loading on the intended factor, 
and where large secondary loadings appear, they are appropriate and meaningful. 
For example, Е1 (Warmth) has a large positive loading on A (.51), because warm 
people are generally sympathetic to others and eager to help them. Likewise, N5 
(Impulsiveness) has a large negative secondary loading on C, because people with 
low self-control are characterized by the inability to manage their impulses or de- 
sires. The above results are also consistent with findings from other cross-cultural 
studies (Pulver, Allik, Pulkkinen, & Hamalainen, 1995). 

Moreover, in order to check whether the factor analytic results corresponded to 
the conceptual model, factor scores were computed for the principal factor solution, 
and then correlated with the ‘theoretical’ scores on the five scales, which were ob- 
tained by summing the scores from their related items (Caprara, Barbaranelli, Bor- 
goni, & Perugini, 1993; Costa & McCrae, 1992; Gudjonsson & Sigurdsson, 1999). 
The correlations were .89, .92, .81, .83, and .91, for E, N, O, A, and C, respectively. 
The results revealed that there was a substantial overlap between the factor scores 
and their ‘theoretical’ scores, which is another indication of the structural validity of 
the questionnaire. 

Factorial invariance in the TPQue is a necessary, though not sufficient, criterion 
for judging the success with which the instrument reflects the model. For this rea- 
son, the stability of the factor structure across different samples was also examined. 
Separate factor analyses were conducted for males and females (Tsaousis, 1999) as 
well as for non-applicants, job applicants, and employees (Tsaousis & Nikolaou, in 
press), and congruence coefficients (Harman, 1976) were calculated between the 
contrasted pairs'. The results justified the high stability of the factorial structure of 
the TPQue, since the same five factors were found in each group. and all the congru- 
ence coefficients were above the critical value of .90 (Harman, 1976). 

Given this convergence of the results of exploratory factor analysis, it seemed 
highly recommendable to apply the method of confirmatory factor analysis (CFA) to 
provide yet another check on the factor structure. 


The models 


Several factor structures were theoretically possible, based on either the 30 scales or 
the 5 factors. In this CFA, seven different models were tested: 


Model A: Null model: This model had no common factors between the 30 TPQue 
sub-scales. It was used as a baseline for the other models. 

Model B: Опе factor model: This model assumed that all the sub-scales were 
loaded onto a single factor. 


' Congruence coefficient is a statistical idex that allows the extent of similarity of dissimilarity 
between two sets of factors obtained from different samples or solutions to be determined. 
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Model C: TPQue model: This model was obtained by fixing all factor loadings 
for the 30 sub-scales to the varimax-rotated values obtained in our ex- 
ploratory factor analysis. Each TPQue scale was an indicator only of its 
Big Five structure, and no secondary loadings were allowed. 

Model D: TPQue secondary loadings model: This model was obtained by im- 
posing the primary loadings of Model C in addition to the 26 secondary 
loadings found to be .20 or higher in the validimax factor structure solu- 
tion from the exploratory factor analysis?. 

Model E: Standard Orthogonal model: This model represents the most parsimo- 
nious model with the five factor theory. Each sub-scale was loaded only 
on the intended factor (no secondary loadings), and no parameters were 
pre-estimated, but were left to vary freely. 

Model F: Oblique model: This model resulted from the imposition of ten addi- 
tional constrains on the parameters of model E. In particular, the TPQue 
five factors were allowed to intercorrelate with each other. 

Model G: Modification index model: Finally, this model was derived from the 
program's modification suggestion in order to obtain a better fit. Thus, 
three additional constrains were added to the parameters of model E. 
Firstly, O and C factors were allowed to be correlated. Secondly, the er- 
ror terms of C4: Achievement Striving and Ol: Fantasy, were allowed to 
be correlated with A and C factors, respectively. 


Table 7 displays the CFA results and reports absolute, relative and parsimonious 
indices of fit that comprehensively evaluate the fit of the different models computed. 

As Jóreskog and Sorbom (1993) have noted, the use of chi-square as a central x - 
statistic is based on the assumption that the model holds exactly in the population. 
This assumption, however, is unreasonable and almost unattainable in most empiri- 
cal research (Jóreskog & Sorbom, 1993; Loehlin, 1992; Bollen, 1989; Church & 
Burke, 1994). A consequence of this assumption is that models that hold approxi- 
mately in the population will be rejected in large samples. Thus, with models so 
complex as personality models, in particular the Big Five, this statistic is not ex- 
pected to provide a good fit of data. For this reason, other indices will be concen- 
trated on, which, as literature has indicated, are less conservative and independent of 
sample size. 

The first two models constitute control models against the five TPQue models 
examined. Table 7 shows that according to the statistics of all goodness-of-fit indi- 
ces (absolute, relative, or parsimonious) it is not possible to accept both models. 
Furthermore, all indices appear to have values worse than those of any of the subse- 
quent models. Model C, which was based on the exploratory factor solution, pro- 


? For the E factor, secondary loadings were included for the sub-scales of Angry Hostility, Depressi- 
on, Self-Consciousness, Impulsiveness, Vulnerability, Actions, Ideas, Compliance, Modesty, Compe- 
tence, and Achievement Striving; for the N factor, loadings for Competence and Deliberation; for 
the O factor loadings for Warmth, Excitement-Seeking and Modesty; for the A factor, loadings for 
Warmth, Gregariousness, Angry Hostility, Feelings, and Dutifulness; and for C factor, loadings for 
Assertiveness, Impulsiveness, Vulnerability, Values, and Tender-Mindedness. 
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Table 7. Overall fit Indices for the TPQue factor scales and sub-scales 


Absolute Indices Relative Indices Parsimony Index 
Models 2 df уар ТЫ NFI CFI PNFI 
Model A: Null 2387.77 435 5.49 .88 .86 .88 .81 
Model B: One factor 1791.22 409 4.39 .91  .89  .91 .79 
Model С: TPQue 1453.17 430 3.38 .93 1 .94 ‚85 
Model D: Second. loadings 1781.95 431 4.14 .91 90 .92 .83 
Model E: Orthogonal 1260.01 405 3.11 ` 94 .93 .95 .81 
Model F: Oblique 1164.20 395 2.94 .94 .93  .96 .80 
Model С: Modification Index 1194,97 403 2.97 95 .93  .95 .81 


Note: ТШ = Tucker-Lewis Index; NFI = Normed Fit Index; CFI = Normed Noncentrality Fit Index; 
PNFI = Parsimonious Normed-Fit Index. 


vided a rather acceptable fit, with the ratio of chi-square to degrees of freedom 
(у ар) less than four (3.38). Most importantly, all three relative indices (ТИ, NFI, 
and CFI) together with the Parsimonious Normed-Fit Index (PNFI) also prvided 
acceptable values, indicating good fit (.93, .91, .94, and .85, respectively). This re- 
sult reconfirms the factor structure of the TPQue, as revealed by the exploratory 
factor analysis. In fact, it provides another strong argument in supporting the hy- 
pothesis that the TPQue Concept Model is a measure of the Big Five model. 

In an attempt to improve the above model and to get a better fit to the data, Model 
D was developed, including 26 secondary factor loadings, besides the primary 
loadings from Model C. Unfortunately, this model proved to be worse than the pre- 
vious model (х ја = 4.14, TLI = .91, NFI = .90, CFI .92, and ws .83). Model 
e "appeared to be significantly superior to Model D: y’p (1781.95) - x с (1453.17) = 
X "pit (1) = 328.95, р < .0001. 

In Model E, a five factor orthogonal solution, almost all the parameters of the 
model were left to run freely, instead of specifying the factor loadings for each sub- 
scale (as in Model C). Only the first sub-scale's regression weight for each factor 
was fixed at 1.0, in order to identify the model and to set the metric for the factor 
variances. This model also provided a good fit to the data (y"/df = 3.11, TLI = .94, 
NBI = 98, CFI = 95, and NIE] =") and in fact resulted in a ЭРЕЦ im- 
provement over the two previous models: "a c (1453.17) - X E (1260. 01) = риг (25) 
= 193.16, р < .0001, and y "c (1781.95) - x 6 (1260.01) = x "риг(26) = 521.94, р< 
.0001. 

Model F, where the factor scales were allowed to be intercorrelated between each 
other, had the most parsimonious fit with all indices suggesting good fit (y’/df = 
2.94, TLI = .94, NFL = .93, CFI = .96, and PNFI = = .80). Again this model was sig- 
nificantly superior to Model E: "a e (1260.01) - Хр (1164.20) = x pift-(10) = 95.81, p 
« .0001. 

Finally, although Model G, as suggested by the AMOS modification index facil- 
ity, possessed a better fit than models C and E, (x ldf = 2:97, TEpes95, МЕР= 98, 
CFI = .95, and PNFI = .81), it did not provide an improvement over the Е model. 

In summary, whereas almost all TPQue models proved to fit the data well, the 
most parsimonious was the model with all its factor scales intercorrelated to each 
other (Model F). The above results can be interpreted in two ways: on the one hand, 
there is enough evidence to support the claim that confirmatory factor analysis in- 
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deed reconfirmed the results obtained by exploratory factor analysis, which claimed 
the TPQue Concept Model to be indeed a Big Five model. On the other hand, CFA 
analysis demonstrated that the best fit to the data was achieved when all five factor 
scales were correlated to each other. 

This result is consistent with the theory of Big Five, in the sense that it is not pos- 
sible for personality factors not to be correlated to each other, since they constitute 
dimensions of the same construct. For example, circumplex studies suggest that the 
personality domain may be better represented as a continuum of personality distinc- 
tions, rather than as a set of clearly separable dimensions (Wiggins, 1979; Kiesler, 
1983; Saucier, 1992). It is also consistent with the Concept Model of the TPQue, 
which suggests that its five factor scales are all correlated with each other to some 
extent. 


Standardization of the TPQue — Norms 


Participants 


The normative sample on which the TPQue profile forms are based consists of 1,054 
students, of whom 868 were recruited from 7 Greek universities in the country, and 
186 from two high schools (a private school in Athens and a public school in the 
province of Messinia). The mean age of the participants was 19.9 years (SD = 4.32), 
of which 410 (39 %) were male and 644 (61 96) female Greek students. This pro- 
portion is very close to the proportion of male and female students studying at Greek 
universities (41 96 and 59 96 respectively). The average male student was 20.04 
years (SD = 4.81) whereas the average female student was 19.8 years (SD = 3.99). 
Details of the sample participated in the standardization procedure are provided in 
Table 8. 


3 Source: Minister of Education (1992). Personal correspondence. 


256 Big Five Assessment 


Table 8. TPQue normative sample in terms of region and faculty (N-1,054) 


Institutes Combined Sample Males Females 
Region 
University of Crete 429 (40.7) 133 (32.4) 296 (46.0) 
University of Athens 82 (7.8) 26 (6.3) 56 (8.7) 
University of Patras 93 ( 8.8) 17 (4.1) 76 (11.8) 
University of Macedonia 32 (3.0) 15 (376) 17 .(2.6) 
Panteios University 85 (8.1) 27 (6.6) 58 (9.0) 
БОШКЕ, Е 69  ( 6.5) 55 (13.4) 14 (2.2) 
Ziridis High School : 92 (87) 58 (14.1) 34 (5.3) 
Kalamata High School 94 ( 8.9) 37 ( 9.0) 57 (8.9) 
Various Universities 78 (7.5) 41 (10.5) ЗӨ (бл) 
Total | 1054 (100%) 410 (100%) 644 (100%) 
Faculty 
Classical Studies 308 (29.2) 49 (12.0) 259 (40.1) 
Social Sciences 414 (39.2) 135 (33.0) 279 (43.2) 
Science 272 (2577) 189 (46.1) 83 (12.8) 
Other 60 ( 5.9) 37 (8.9) 23 (3.9) 
Total 1054 (100%) 410 (1005) 644 (100%) 


Note: University of Crete is dispersed in 3 different cities, Chania, Rethymno, and lrakleio. The 
above mentioned figure of 429 students contains samples from all the three cities in the following 
proportion: Chania: 62 (5.8 €), Rethymno: 214 (20.3 X), and Irakleio: 153 (14.6 %); The figures in 
the brackets represent the percentage of the sample 


Norms of the TPQue 


Although the TPQue was initially standardized for a college age population, it has 
also been successfully used in other population groups. It shows the same factor 
structure in student and non-student respondents as well as clinical and non-clinical 
respondents, and has been extensively validated for women as well as men. TPQue 
provides norms for college age population (Tsaousis, 1999), occupational population 
— applicants as well as employees — (Tsaousis & Nikolaou, in press), clinical 
population (Tsaousis & Semkou, 1999), and high school student population (Tsaou- 
sis, 1996). TPQue uses two different normative systems, Percentiles and T scores. 
Additional norms are being developed for other groups at the present time. 


Descriptive characteristics of the TPQue 


The means and standard deviations of all TPQue factor scales and sub-scales in both 
the total sample and in the male and female sub-samples are given in Table 9; t-tests 
were also conducted to investigate whether there were significant differences be- 
tween males and females on all scales. The results indicate that females obtained 
higher scores than males in Neuroticism [7 (1052) = -8.62. p « .0001], Openness 
[1 (1052) = -5.94, p < .0001], Agreeableness [r (1052) = -6.89, р < .0001] and Con- 
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Table 9. Means and Standard Deviations for TPQue Factor scales and sub-scales 


Males Females Combined 
(N=410) (N=644) (N=1,054) 
M SD M SD M SD 
Factor Scales 
Extraversion 122.93 17.18 122.06 17.15 122.40 17.16 
Neuroticism 106.14 17.68 116.14* 18.74 112.25 18.96 
Openness to Experience 122.93 15.37 128.73* 15.49 126.48 15.69 
Agreeableness 115.04 12.38 120761 * 13 05 118.44 13.07 
Conscientiousness 112.97 17.97 116.40* 16.95 115.07 17.43 
Extraversion Sub-scales 
Warmth 19.88 3.49 21.00* 3.28 20.57 3.40 
Gregariousness 20.92 4.13 11-19 4.13 21.08 4.13 
Assertiveness 19.56 4.02 18.96* 4.23 19.19 4.16 
Activity 20.55 3.71 19.62* 3.69 19.98 3.72 
Excitement -Seeking 20.87 3.76 19.97* 3.54 20.32 3.65 
Positive Emotions 21.15 3.77 21.32 3.89 21.26 3.84 
Neuroticism Sub-scales 
Anxiety 18.78 4.41 20.67* 4.67 19.94 4.66 
Angry Hostility 18.37 4.44 19.99* 4.60 19.36 4.61 
Depression 17.80 4.10 19.65* 4.43 18.93 4.39 
Self-Consciousness 17.26 4.25 18.41* 4.33 17.96 4.34 
Impulsiveness 18.92 3.33 19.64* 3.39 19.36 3.38 
Vulnerability 15.00 4.76 17.77* 4.74 16.69 4.93 
Openness Sub-scales 
Fantasy 21.96 4.26 22.66* 4.08 22.39 4.15 
Aesthetics 20.43 5.05 23.16* 4.21 22.10 4.74 
Feelings 20.51 3.42 22.35* 3.49 21.63 3:57, 
Actions 19.99 3.79 20075 ° 3:93 20.46 3.89 
Ideas 20.02 3.37 19.66 3.43 19.80 3.41 
Values 20.02 4.57 20.15 4.46 20.10 4.50 
Agreeableness Sub-scales 
Trust 20.79 3.37 20.89 3.37 20.85 3.37 
Straightforwardness 18.04 3.31 18.96* 3.31 18.60 3.34 
Altruism 19.24 3.00 19.97* 3.06 19.69 3.06 
Compliance 18.00 3.84 19.14* 4.05 18.70 4.01 
Modesty 18.67 3.90 19.535 3.97 19.20 3.96 
Tender-Mindedness 20.31 3.07 ИК 308 21.41 3.17 
Conscientiousness Sub-Scales 
Competence 20.32 3.80 20.05 3:52 20.16 3.64 
Огдег 17:15 5.00 18.36* 4.64 17.89 4,82 
Dutifulness 18.28 3.59 1952115993761 18.85 3.63 
Achievement Striving 19.88 3.78 20.44* 3.73 20.22 3.76 
Self-Discipline 18.58 3.86 1033 *° 3.57 19.04 3.70 
Deliberation 18.76 3.95 19.01 4.12 18.91 4.05 


Note: * Significant differences between males and females at p < 0.05. 
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scientiousness [t (1052) = -3.13, р < .001], while there were no significant differ- 
ences between males and females in Extraversion [7 (1052) = .81, ns]. The results 
are consistent with studies where other personality inventories have been used (e.g., 
Costa & MtCrae, 1992; Eysenck & Eysenck, 1975). 

Summarizing, the Traits Personality Questionnaire (TPQue) contains 206 items, 
requires approximately a fifth-grade reading level, and provides evidence of sound 
psychometric properties across different samples. Research evidence suggests that 
the TPQue is already beginning to prove its utility not only as a research tool but 
also in a number of applied settings, including personal development counseling, 
vocational guidance, personnel selection and appraisal. Although some of its sub- 
scales need revision in order to improve its effectiveness in measuring the Big Five 
constructs, the measure appears to be a useful measurement tool for assessing 'nor- 
mal' personality. Finally, although we already have evidence for discriminant and 
convergent validity, experimental studies are needed to provide more concurrent 
validity information, especially with other Big Five scales when these are available 
in the Greek language. 
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BIG FIVE ADJECTIVE SCALES 


Chapter 11 


The Interpersonal Adjective Scales: Big Five 
Version (IASR-B5) 


Jerry S. Wiggins 
Krista K. Trobst 


Introduction 


"The five-factor model provides a larger framework in which to orient and interpret 
the circumplex. and the interpersonal circle provides a useful elaboration about as- 
pects of two of the five factors" (McCrae & Costa, 1989, p. 593). 

Recent research in personality structure has emphasized both five-factor and cir- 
cumplex structural models (e.g., Wiggins & Pincus, 1992) and, as the opening quo- 
tation from McCrae and Costa makes clear, these two models are seen as comple- 
mentary rather than competitive (see Figure 1). The Interpersonal Adjective Scales 
— Revised: Big Five Version (IASR-B5) were constructed to provide “. . . a highly 
efficient instrument for combined circumplex and five-factor assessment" (Trapnell 
& Wiggins, 1990, p. 781, emphasis added). 

As discussed elsewhere (Wiggins, 1995), the Interpersonal Adjective Scales are 
embedded in a conceptual framework that has its origins in five venerable traditions: 
(1) the lexical tradition (John, Angleitner, & Ostendorf, 1988), (2) the interpersonal 
theory tradition in clinical psychology and psychiatry (Kiesler, 1996), (3) the tradi- 
tions of order and facet analysis (Guttman, 1966), (4) the social exchange and im- 
pression management traditions (Carson, 1969), and (5) the multivariate-trait tradi- 
tion (Wiggins & Trapnell, 1997). Over time, the scales have been modified for dif- 
ferent purposes and their names changed to reflect these modifications: the Interper- 
sonal Adjective Scales were first (IAS; Wiggins, 1979), followed by the revised In- 
terpersonal Adjective Scales (IAS-R: Wiggins, Trapnell, & Phillips, 1988), and by 
the extension of the Interpersonal Adjective Scales to include the Big Five dimen- 
sions of personality (IASR-B5; Trapnell & Wiggins, 1990). 


Big Five Assessment, edited by B. De Raad & M. Perugini. O 2002, Hogrefe & Huber Publishers. 
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Five Factor Model Circumplex Model 


Assured- 
Dominant 


Gregarious- 


Arrogant- 
Extraverted 


Calculating 
I SURGENCY/EXTRAVERSION 


II AGREEABLENESS 


ПІ CONSCIENTIOUSNESS Cold- Warm- 
hearted Agreeable 
IV  NEUROTICISM 
V OPENNESS TO EXPERIENCE 
Aloof- Unassuming- 
Introverted Ingenuous 
Unassured- 
Submissive 


Figure 1. Five-factor model of personality and circumplex model of interpersonal behavior (from 
Trapnell & Wiggins, 1990, page 782). 


Interpersonal Adjective Scales (!AS) 


The Interpersonal Adjective Scales (IAS) were conceived at Oregon Research Insti- 
tute during the 1970s in a collaborative project with Lewis R. Goldberg. Briefly, 
Goldberg (1977) had developed his well known lexical taxonomy of trait descriptive 
terms, with adjectives classified according to word usage in self- and other- 
description, and he was encouraging other investigators to develop alternative tax- 
onomies for comparative purposes. With the help of other investigators at ORI, 
Wiggins developed a psychological taxonomy of trait-descriptive terms which was 
conceptually based (rather than based solely on word meanings). The universe of 
content here was the 18,125 terms classified by Norman (1967) and further devel- 
oped by Goldberg (1977). In our taxonomy, what had previously been considered by 
Goldberg to be a taxon of "stable biophysical traits" was further differentiated into 
interpersonal traits, material traits, temperamental traits, social roles, character, and 
mental predicates (Wiggins, 1979). Within the category of approximately 800 inter- 
personal trait terms, an initial taxonomy was developed with reference to the writ- 
ings of Timothy Leary (1957). The measurement model adopted to represent this 
category of traits was the circumplex model of Louis Guttman (1954). 


! James M. Kilkowski and Alexander Galvin. 
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In the circumplex model, variables are arrayed in a circular fashion around the 
two orthogonal bipolar axes of dominance/submissiveness and nurturance/coldness. 
A series of initial attempts to develop circumplex measures for the interpersonal 
categories described by Leary (1957) were unsuccessful due to notable gaps in cov- 
erage and a lack of bipolarity among certain categories presumably located opposite 
to one another on Leary's circle (Wiggins, 1979, pp. 400-402). Revision of the 
original Leary categories and new item selection procedures resulted in eight 16- 
item genuinely bipolar categories. This 128-item version was called the Interper- 
sonal Adjective Scales (IAS) and its empirical circumplex structure was among the 
best reported in the literature up to that time (Wiggins, Steiger, & Gaelick, 1981). 


Interpersonal Adjective Scales — Revised (IAS-R) 


The IAS was revised in order to: (a) provide a short-form measure that would make 
it more convenient for investigators to include the IAS in test batteries and (b) en- 
sure that the two circumplex dimensions were orthogonal to the remaining three 
dimensions (1.е., Neuroticism, Conscientiousness, and Openness) of the Big Five 
factors of personality research (Wiggins et al., 1988). With respect to the latter goal, 
both published and unpublished feedback from colleagues had suggested appropriate 
modifications. For example, Peabody and Goldberg (1989) had suggested that the 
ambitious (P) versus lazy (H) contrast is not strictly interpersonal in nature and 
McCrae (personal communication, September 4, 1986) had indicated that a joint 
factor analysis of the IAS and the NEO-PI confirmed that the IAS ambitious scale 
was most strongly associated with the NEO Conscientiousness domain. Further- 
more, Kiesler (1983) had previously indicated that the P versus H contrast was best 
interpreted as assured (P) versus unassured (H). Our revision of IAS-R substantiated 
Kiesler's interpretation and was in accord with the suggestions of Peabody and 
Goldberg and with those of McCrae (although the IAS-R P vs. H dimension contin- 
ues to demonstrate a moderate association with the achievement striving facet of the 
NEO-PI-R; Costa & McCrae, 1995). The structure of the 64-item version of IAS 
(IAS-R) is illustrated in Figure 2. This version has been shown to meet the strong 
geometric and substantive assumptions involved when classifying persons into ty- 
pological categories (Wiggins, Phillips, & Trapnell, 1989). It is also the version that 
is available as a commercial test (Wiggins, 1995). 


266 Big Five Assessment 


PA 


Assured-Dominant 
90° 


Gregarious- 
Extraverted 
45? 


Arrogant- 
Calculating 
135° 


457° 
LM 
ОЕ. Warm- 
hearted Agreeable 
180° 0 
202° 337° 
JK 
Hm Unassuming- 
Introverted Ingenuous 


315* 


270° 
Unassured-Submissive 


HI 


Figure 2. Circumplex structure of revised IAS (from Wiggins, 1995, page 4). 


Interpersonal Adjective Scales - Big Five Version (IASR-B5) 


Scale construction 


Our recognition that the Dominance and Nurturance coordinates of the interpersonal 
circumplex are rotational variants of the Extraversion and Agreeableness dimensions 
of the five-factor model (е.р., McCrae & Costa, 1989) led us to construct an adjecti- 
val Big Five version of the Interpersonal Adjective Scales (IASR-B5; Trapnell & 
Wiggins, 1990). Once again, scale construction was based on Goldberg's (1977) 
seminal item pool of 1.710 trait-descriptive adjectives that was derived from the 
dictionary studies of Allport and Odbert (1936) and of Norman (1967). We used the 
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portion of Goldberg's data set that included ratings of self-applicability of each of 
the 1710 adjectives by 187 university undergraduates. Homogeneous clusters of 
items relating to neuroticism (№ = 30), conscientiousness (№ = 25), and openness to 
experience (N = 44) were selected from this data set that were identical to or close 
synonyms of adjectives found to have high factor loadings in previous factor- 
analytic studies (e.g., Goldberg. 1985; McCrae & Costa, 1985). These three clusters 
were supplemented by items not included in the 1,710 item list (primarily negations 
of high loading items) and the total item set was administered to two new samples of 
581 and 360 undergraduates. The 10 highest loading positive and the 10 highest 
loading negative items were retained within each factor (see Trapnell & Wiggins, 
1990). 

Three 20-item balanced scales of neuroticism, conscientiousness, and openness to 
experience, together with our previously developed circumplex measure of domi- 
nance and nurturance, constitute the final version of IASR-BS. It should be noted 
that in the present version of the IASR-BS (available from the authors) the first 64 
items are identical in format, item order, and glossary to the those found in the 
commercial version of the IAS. Thus, all findings with respect to the eight octant 
scales and circumplex of IAS-R (e.g., Wiggins, 1995) apply in toto to the eight oc- 
tant scales and circumplex of IASR-B5. 

The final version of IASR-B5 therefore consists of 124 adjectives that are rated 
for self-descriptiveness on an eight-place Likert scale ranging from “1 = extremely 
inaccurate" to "8 = extremely accurate". The first 64 adjectives comprise the Domi- 
nance and Nurturance factors underlying the interpersonal circumplex and are used 
to score the eight octants (i.e., eight items per octant) of the Interpersonal Adjective 
Scales (Wiggins, 1995). Weighted combinations of these octant scores may then be 
computed for those wishing to work with factor scores. High loading Dominance 
items include dominant, assertive, domineering, and forceful, and high loading 
Nurturance items include tenderhearted, kind, charitable, and sympathetic. 

The remaining 60 items of the IASR-B5 comprise the Neuroticism, Openness, 
and Conscientious factors (i.e., 20 items per factor, half of which are reverse- 
scored). High loading Neuroticism items include worrying, tense, anxious, and 
nervous. High loading Openness items include philosophical, inquisitive, imagina- 
tive, and abstract-thinking. And high scoring Conscientiousness items include or- 
ganized, tidy, orderly, and planful. 

The IASR-B5 takes approximately 20 minutes to complete and a glossary is pro- 
vided to each respondent to ensure knowledge of word meanings. Although the 
IASR-BS has primarily been used as a self-report inventory, more recent research 
has suggested that it may also be used in peer-report format. The IASR-BS is suit- 
able for use in research, industry, and clinical settings, although it is best employed 
among more educated respondents because а 10" grade reading level is required. 
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Table 1. Principal components of five-factor domain scales. 


Scale ~~ і | Ш IV M 


Extraversion (NEO) .84 
Dominance (IAS) ‚82 
Sociability (HPI) ‚81 
Ambition (НР1) 271 .38 


Neuroticism (NEO) 297 
Neuroticism (IAS) .89 
Adjustment (HPI) -.84 


Openness (IAS) ‚85 
Openness (NEO) .80 .35 
Intellect (HPI) 272. 


Love (IAS) ‚83 
Agreeableness (NEO) .81 
Likability (HPI) .45 .68 


Conscientiousness (IAS) .84 
Conscientiousness (NEO) .83 
Prudence (HPI) .65 


Note: Adapted from Wiggins and Pincus (1994, p. 84). М = 581; loadings « .33 omitted. NEO = NEO 
Personality Inventory; IAS = Extended Interpersonal Adjective Scales; HPI = Hogan Personality In- 
ventory. 


Psychometric characteristics 


Convergent and discriminant validity 


The convergent and discriminant validities of the IASR-B5 were established (Wig- 
gins & Pincus, 1994) with reference to a sample of 581 undergraduates who had 
been administered the NEO-PI (Costa & McCrae, 1985), the Hogan Personality In- 
ventory (HPI; Hogan, 1986), and the IASR-B5 (Trapnell & Wiggins, 1990). These 
findings appear in Table 1. Overall, it was concluded that IASR-B5 has an excellent 
structure on the item level, internally consistent scales, and promising convergent 
and discriminant properties when compared with the NEO Personality Inventory and 
the Hogan Personality Inventory (Trapnell & Wiggins, 1990). 


Peer ratings 


Costa and McCrae (1995) administered the Revised NEO Personality Inventory in 
peer rating format to 380 participants (aged 19 to 96) in the Baltimore Longitudinal 


Table 2. Correlations of Wiggins' and Goldberg's measures with NEO-PI-R facets in peer ratings 
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NEO-PI-R Facet Scale 


Factor 


О 


А 


Ww 


G 


G 


Ww 


C 


G 


N1: 
N2: 
N3: 
№; 
№: 
N6: 


Este 
E25 
E3: 
E4: 
E5: 
E6: 


On; 
02: 
03: 
04: 
05: 
06: 


A1: 
А@: 
АЗ: 
A4: 
A5: 
A6: 


Є1: 
Gz 
(25: 
C4: 
C5: 
Сб: 


Anxiety 

Angry Hostility 
Depression 
Self-Consciousness 
[mpulsiveness 
Vulnerability 


Warmth 
Gregariousness 
Assertiveness 
Activity 

Excitement Seeking 
Positive Emotions 


Fantasy 
Aesthetics 
Feelings 
Actions 
Ideas 
Values 


Trust 
Straightforwardness 
Altruism 
Compliance 
Modesty 
Tender-Mindedness 


Competence 
Order 
Dutifulness 


Achievement Striving 


Self-Discipline 
Deliberation 


68 .66 
.61 .43 


204510559 
-.61 
-.44 
2257 
2990 


-.43 


.54 
.84 
.54 


157 


-.64 
-.47 


‚47 


-.40 


42 
.46 
257 
.42 


.45 


.40 


.78 


.49 


.50 


.43 


-.54 


237) 


.52 


.66 
237 
75 
.48 
287] 
"57 


-.61 


252 


.66 
.60 
.68 
.64 
.64 
297] 


.61 
.74 
.56 
.46 
‚72 
‚45 


-.41 


.57 
351 
.59 
.54 
.64 
.45 


Note: Adapted from Costa & McCrae (1995, p. 35). NEO-PI-R = Revised NEO Personality Inventory, М 
= Neuroticism, E = Extraversion, О = Openness to Experience, A = Agreeableness, C = Conscientious- 
ness, W - Wiggins' Revised Interpersonal Adjective Scales - Big Five Version (IASR-B5), G - Gold- 
berg’s Transparent Trait Rating Form (ТТАР). N = 150 for IASR-B5, N = 128 for TTRF. Correlations 
above + .40 are given; all are significant at p « .001. 


Study of Aging. Peer ratings were also obtained from some of these participants on 
either the Goldberg (1992) Big Five transparent adjectival markers (N = 150) or the 
IASR-B5 (N = 128). Table 2 presents the correlations of Wiggins’ and Goldberg’s 
measures with NEO-PI-R facets in these peer ratings. 
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Table 3. Internal consistency of IAS scales (Cronbach's alpha). 
E INN We cX———— ——-—— —— Á 1 1 1 1L — 


Sample 
Е Adult* College student 
IAS Scale (N = 1,083) (№ = 2,825) 
Assured-Dominant (PA) .790 .848 
Arrogant-Calculating (BC) .865 2.841 
Cold-hearted (DE) .810 | .841 
Aloof-Introverted (FG) .840 .860 
Unassured-Submissive (HI) .815 .852 
Unassuming-Ingenuous (JK) 355 2738 
Warm-Agreeable (LM) .850 .865 
Gregarious-Extraverted (NO) .835 .843 


Note: Adapted from Wiggins (1995, p. 44); * Adult sample includes employment selection, BLSA, 
and volunteer samples. 


With the exception of the Openness dimension, the facets of the NEO-PI-R are 
reasonably well represented in both the Goldberg (G) and the Wiggins (W) adjecti- 
val scales. With respect to the Openness to Experience dimension, the ТАЗЕ-В5 
shares the Goldberg approach of focusing primarily upon cognitive aspects (i.e., 
intellect or ideas) of openess, unlike the broader approach of the NEO-PI-R that as- 
sesses additional facets. Also, the Goldberg and Wiggins scales provide substantial 
markers of Neuroticism facets, although the Goldberg scales might also be consid- 
ered a negative measure of NEO-PI-R Agreeableness. Similarly, the Wiggins’ 
measure of Extraversion 15 negatively loaded on NEO-PI-R Agreeableness. In gen- 
eral, however, the IASR-B5 performs well under peer-rating instructions with re- 
spect to the NEO-PI-R. 


Internal consistency of IASR-B5 self-report 


The results of studies of the internal consistency of the eight interpersonal adjective 
scales of IAS-R apply to IASR-B5 for reasons already discussed. Table 3 presents 
Cronbach's alpha coefficients in an adult sample (N = 1.083) consisting of an em- 
ployment selection group, a sample from the Baltimore Longitudinal Study of Ag- 
ing, and a volunteer sample of adults. Alpha coefficients were also computed in a 
university sample (N — 2,825). With the possible exception of the Unassuming- 
Ingenuous scale (JK), the coefficients of internal consistency are substantial’. Inter- 
nal consistency coefficients were also reported by Trapnell and Wiggins (1989), 
based on a sample of 941 undergraduates, for Neuroticism, Conscientiousness, and 
Openness to Experience. A later analysis is also available, however, based on a 
much larger sample (№ = 2,825), in which alphas were: .90 for Neuroticism; .93 for 
Conscientiousness; and, .87 for Openness. 


? The relative lack of internal consistency of the IAS-R JK scale would appear to reflect on the di- 
mension itself rather than on the measuring instrument. A preliminary review of the literature 
strongly suggested that this dimension tends to be the least internally consistent for almost all 
circumplex measures, regardless of content. 
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Table 4. internal and temporal reliability estimates for informant ratings using IAS. 


Time 1* Time 2* 

l M SD ос M SD oc Retestr ES SEM 
Dominance Coordinate (DOM) 0.59 1.17 -— 045 0.96 -- 75 а сан 
Nurturance Coordinate (LOV) 0.66 1.59 --> 0.86 1.40 --° M3 130183 
Octant Scales: 

Assured-Dominant (PA) 53 (E10 85 5t3 1.0 ‚80 ‚67 .05 .064 
Arrogant-Calculating (BC) 3.1 12529991 3.0 15 .92 ani -.06 .083 
Cold-hearted (DE) 24e 132. a88 289 51 .91 .67 .09 .068 
Aloof-Introverted (FG) ОАТ 186 89) 2.2 ils 88 70 .12 .060 
Unassured-Submissive (НІ) ONE 2E 55 2.9 0.2: .86 ‚64 -.05 .074 
Unassuming-Ingenuous (JK) 4.7 1.4 83 4.9 1.3 .84 .62 .14 .088 
Warm-Agreeable (LM) * 6381. 2.118.92 6.1 the”? 91 ‚69 -.15 .067 
Gregarious-Extraverted (NO) 6.6 10 .89 6.4 1.0 ‚87 .70 -.12 .057 


Note: Adapted from Kurtz et al. (1999, p. 108). IAS = Interpersonal Adjective Scales; Time 1 = 
first administration; Time 2 - second administration 6 months after Time 1; ES - effect size for 
mean score change; SEM = standard error of measurement; è N = 109. ° Coefficient alpha was not 
computed for DOM and LOV coordinates, since these are weighted composites of the octant scores; 
* p < .05 for mean score difference between Time 1 and Time 2. 


Internal consistency and temporal stability of IAS-R informant ratings? 


Because it is both brief and comprehensive, IAS-R is well suited for use in peer- 
rating and spouse-rating studies, particularly those calling for repeated measure- 
ments. In this respect, Kurtz, Lee, and Sherker (1999) have provided important data 
regarding both the internal consistency and temporal stability of informant ratings 
based on IAS-R. The findings of Kurtz et al. are reproduced in Table 4 in which 
alpha coefficients are provided for both Time 1 and Time 2 (separated by six 
months), as well as retest correlations. 

The mean values for the Dominance (DOM) and Nurturance (LOV) coordinates 
reported in Table 4 were calculated employing geometric formulae of weighted oc- 
tant z-scores where DOM = .3((zPA - zHI) + .707(zNO + zBC - zFG - zJK)] and 
LOV = .3[(zLM - 2DE) + .707(&NO - zBC - zFG + zJK)] (see Wiggins, 1995, p. 17). 
The alpha coefficients are substantial for both Time 1 and Time 2 and the retest co- 
efficients are moderate (range .62 to .71). One of the more important values for per- 
sonality appraisal with IAS-R is the angular location which provides a directional 
summary of an individual's self-report (or of an informant's report) and thereby in- 
dicates the typological octant in which these reports may be classified. To the extent 
that these angular locations vary over time, an important change has taken place in 
the individual's self-view (or in the informant's view of the individual). Kurtz et al. 
(1999) provide an excellent discussion of this issue as it relates to diagnostic classi- 
fication with IAS-R. 


3 Such information is currently available only for the circumplex portion and not for the М, C, and O 
components of IASR-B5. 
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Circumplex measures of personality may be used to classify or diagnose an indi- 
vidual as being a certain "type", depending on the octant in which the individual's 
DOM and LOV scores fall, and as expressing that type with differing degrees of 
clarity or intensity (vector length). It is therefore expected that the temporal stability 
of both self- and other-reported types will be related to vector length. An extremely 
extraverted person with substantial vector length is likely to be classified as falling 
within the NO octant six months or even six years from now. An individual with a 
much smaller vector length will not have a clear pattern of scores and therefore will 
be more likely to be classified within a different octant on re-testing. Thus, in Kurtz 
et al.’s (1999) sample “Stability of Angular Location was found to vary inversely 
and significantly with Vector Length at Time 1 (r = -.43; p « .001) and Time 2 (r =- 
.36; p <.001)” (р. 109). 

As a consequence of the above relations, less than half of these respondents re- 
mained classified in their original octants on re-testing six months later. This situa- 
tion is not peculiar to circumplex analysis and it will be encountered whenever re- 
spondents fail to report, or are perceived not to have, a “distinctive personality”. 
Kurtz et al. recommended that IAS-R not be used to classify respondents with vector 
length scores less than 60 and they emphasized that IAS-R classification is generally 
more appropriate for clinical groups, particularly personality disorders. 


IASR-B5 and the personality disorders 


*. . . the recently discovered success of the FFM in clarifying psychiatric concep- 
tions of personality disorders in DSM-III-R (Costa & Widiger, 1994) must be attrib- 
uted in large part to the two circumplex dimensions of that model which stem from a 
clinical tradition which has already served a similar function for the personality dis- 
orders of DSM-I (e.g., Leary, 1957), DSM-II (e.g., Plutchik & Platman, 1977), and 
DSM-III (e.g., Wiggins, 1982)" (Wiggins & Trapnell. 1996, p. 89). 


MMPI personality disorder scales 


Morey, Waugh, and Blashfield (1985) developed MMPI scales for the eleven per- 
sonality disorders described in DSM-III (American Psychiatric Association, 1980). 
Under a combined rational/empirical strategy, MMPI items were assembled into 
both overlapping and non-overlapping sets of eleven scales. The then current re- 
search on these MMPI personality disorder scales was reviewed in a symposium 
held at the 1987 meetings of the American Psychological Association (Greene, 
1987). At that symposium, Wiggins (1987) presented the results of a study designed 
to evaluate the hypothesis that the MMPI personality disorder scales may be con- 
strued within the framework of a circumplex model of personality. Two prior studies 
suggested this hypothesis: (a) Schaefer (1961) had made a convincing case for the 
interpretation of the two-dimensional structure of the MMPI as a circumplex of 
clinical scales and (b) Plutchik and Platman (1977) had demonstrated that when psy- 
chiatrists rated seven personality disorder labels from DSM-II on twelve interper- 
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sonal trait terms, the first two components extracted from the intercorrelations 
among trait terms exhibited a clear circumplex structure. Wiggins (1987) found sup- 
port for these earlier findings, but emphasized that "Although the MMPI personality 
disorder scales are likely to have interpersonal correlates, the dimensions of domi- 
nance and nurturance are unlikely to capture other non-interpersonal characteristics 
associated with these scales. A more appropriate frame of reference for examining 
the content of these scales is provided by the so-called "Big Five" dimensions of 
personality." 

An extensive multitrait-multimethod investigation of this topic was later con- 
ducted by Wiggins and Pincus (1989). A sample of 581 university students were 
administered the MMPI personality disorder scales developed by Morey et al. 
(1985) as well as the personality disorder scales of the Personality Adjective Check 
List developed by Strack (1987). Participants also completed the NEO Personality 
Inventory (NEO-PE Costa & McCrae, 1985) and the Interpersonal Adjective 
Scales—Big 5 version (IASR-B5; Trapnell & Wiggins, 1990). Clear and meaningful 
projections of the personality disorder scales of Morey ег al. and of Strack onto the 
IASR-B5 circumplex were found for the histrionic, dependent, avoidant, schizoid, 
and narcissistic disorders. These encouraging findings were qualified by the state- 
ment: “Although the circumplex model illuminated conceptions of some of the dis- 
orders, the full 5-factor model was required to capture and clarify the entire range of 
personality disorders" (Wiggins & Pincus, 1989, p. 305). 

Within five years time, this conclusion had become canonical (e.g., Costa & 
Widiger, 1994). 


Profile analysis 


The expanded profile analysis obtained from the combination of the five-factor and 
circumplex components of the IASR-B5 made possible a more comprehensive diag- 
nostic device than was available with either component alone. Figure 3 displays the 
IASR-B5 profiles of high scoring respondents on each of three MMPI personality 
disorder scales: avoidant (№ = 44), schizoid (М = 50), and antisocial (М = 50) from a 
study by Trapnell & Wiggins (1990). As a group, the antisocial respondents clearly 
fell within the cold-hearted octant, the schizoid respondents fell within the aloof- 
introverted octant, and the avoidant respondents fell within the unassured- 
submissive octant. 

Although standard circumplex analysis would end at this point, the IASR-B5 pro- 
file provides additional structural information in the bar graphs for neuroticism, con- 
scientiousness, and openness to experience. Here it is apparent that: neuroticism 
further distinguishes avoidant from schizoid respondents; avoidant respondents are 
somewhat less conscientious than schizoid respondents; and antisocial respondents 
are quite low on conscientiousness. A final distinguishing feature of avoidant res- 
pondents is that they are closed to experience in comparison with schizoid and anti- 


social groups. 
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Figure 3. IASR-B5 profiles of respondents scoring high on MMPI personality disorder scales: 
Avoidant (№ = 44), Schizoid (N = 50), and Antisocial (N = 62)(from Trapnell & Wiggins, 1990, page 


789). 
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Other applications of IASR-B5 


The circumplex portion of the IASR-BS (i.e., IAS or IAS-R) has been used exten- 
sively in clinical, social, and personality research; the interested reader may consult 
the IAS manual (Wiggins, 1995) or Wiggins and Trobst (1997) for details. Here we 
will restrict our review to those published studies that have employed the full IASR- 
Be. 

The brevity of IASR-BS and the extent to which it provides markers of both the 
five-factor and circumplex structures make it a highly efficient tool for pilot and 
exploratory studies of new domains. The complete IAS-R remains intact within this 
instrument and thus provides a measure of eight interpersonal dimensions of central 
importance to personality and social psychology (Wiggins & Broughton, 1985), is 
ideally suited for repeated testing, mass screening, and exploratory investigations 
(Wiggins er al., 1988), and may be confidently used for classification of persons into 
typological categories (Wiggins er al., 1989). The IASR-B5 markers of the remain- 
ing three dimensions of the five-factor model are less comprehensive, but they are 
sufficiently related to more comprehensive measures of the Big Five to serve in an 
exploratory capacity in pilot studies of new domains. 

The IASR-B5 has served as a criterion measure for evaluating the validity of 
other instruments such as the NEO-PI-R (Caldwell-Andrews, Baer, & Berry, 2000), 
as a marker in the development of other self-report measures such as psychopathy 
scales (Hill, 2000), and as an observer form in comparing university students and 
incarcerated offenders (Hart & Hare, 1994). IASR-BS has also served as а frame of 
reference for evaluating the (limited) range of behaviors assessed by a variety of 
social support measures (Trobst, 2000). An Italian language version of IASR-B5 has 
been developed (Di Blas, 2000) and has been related to other Italian markers of the 
Big Five (Perugini, Gallucci, & Livi, 2000). Gallo and Smith (1999) used IASR-B5 
to classify the distinctions among aggressive traits in Buss and Perry’s (1992) Ag- 
gression Questionnaire. 

Three studies of IASR-BS stand out, in particular, for their non-obvious and/or 
surprising results. Sear and Stephenson (1997) assessed the interviewing perform- 
ance and skill of 19 police officers attached to a London Metropolitan Police De- 
partment. They filled out police interview evaluation forms for four audiotaped in- 
terviews by each of 19 officers (76 interviews in total). All officers were also ad- 
ministered the IASR-B5. Results indicated that only Openness was significantly 
associated with interviewing skill but in the opposite direction to that predicted. The 
more skillful police interviewers were closed to experience. 

The distributive justice dilemma is a classic paradigm for studying moral re- 
sponses to a task that involves allocating money to self and three others and making 
judgments about such allocation behaviors. In an interesting study of this dilemma, 
Day (1998) randomly assigned 106 female and 95 male university students to one of 
three conditions: (1) a group that responded to a hypothetical dilemma, (2) a group 
that responded to a real situation with play money, or (3) a group that responded to a 
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real situation with real money. Participants also completed the Moral Judgment In- 
terview (MJI; Colby & Kohlberg, 1987) and the IASR-BS5. In these situations, 
Openness scores on IASR-BS5 were as predictive of allocation behavior as were re- 
sponses to the Moral Judgment Interview! 

Fehr and Broughton (2001) have demonstrated the utility of IASR-B5 in clarify- 
ing typologies of love, a topic that had previously been investigated almost exclu- 
sively within a social psychological framework. Earlier, Berscheid and Hatfield 
(1974) had made an influential distinction between passionate love (characterized 
by intense emotions, physiological arousal, and strong sexual attraction) and com- 
panionate love (deep affectional bonds based on trust, respect, caring, and honesty). 
In a university student sample, Fehr and Broughton administered IASR-B5 and the 
Views of Love Questionnaire (Fehr, 1994), in which respondents rate how similar 
their own view of love is to descriptions of 15 different types of love. The projec- 
tions of these types of love onto the IASR circumplex were quite revealing; for ex- 
ample, romantic love fell near the center of the arrogant-calculaüng octant (BC), 
sexual love fell near the center of the cold-hearted octant (DE), maternal love fell 
near the center of the unassuming-ingenuous octant (JK), and parental love fell near 
the center of the gregarious-extraverted octant (NO). Correlations of types of love 
with the remaining three dimensions of the Big Five were equally interesting. For 
example, passionate kinds of love (e.g., sexual love, infatuation) were positively 
correlated with neuroticism, and companionate kinds of love (emphasizing emo- 
tional stability and calmness) were negatively related to neuroticism. Openness to 
experience was negatively related to committed love, especially for men. 

The revised Interpersonal Adjective Scales (IAS-R) provide a well documented 
assessment of the two major dimensions of interpersonal behavior (e.g., Wiggins 
1995). The remaining three scales of personality included іп IASR-BS provide well 
established markers of neuroticism, conscientiousness and openness to experience as 
documented here and elsewhere (e.g., Trapnell & Wiggins, 1990). Taken together in 
IASR-B5, these dimensions provide a highly useful preliminary survey of dimen- 
sions of personality that has proved helpful in a variety of different contexts. 


Personality and interpersonal behavior 


In a highly integrative and informative paper, McCrae and Costa (1989) character- 
ized the relations between the interpersonal circumplex, as measured by the IAS 
(Wiggins, 1979) and the five-factor model of personality. as measured by the NEO 
Personality Inventory (Costa & McCrae, (1985). 

The circumplex is intended to include only dispositions related to interpersonal 
interactions (Wiggins, 1979); the five-factor model aims at comprehensiveness and 
so includes affective, experiential, and motivational traits as well as interpersonal 
traits (McCrae & Costa, 1989, p. 586). McCrae and Costa (1989) also indicated that 
love and status, the conceptual underpinnings of the interpersonal circumplex 
". . . are essentially interactional concepts that describe the relationships between 
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two individuals; they are useful in the analysis of enduring social roles or dynamic 
social encounters. . . Love and status are not, however, necessarily the best concepts 
for understanding enduring dispositions in individuals" (p. 591, emphasis added). 

The five-factor model of personality has been interpreted from several rather dif- 
ferent theoretical perspectives (Wiggins, 1996) and one of these views, the “dyadic 
interactional perspective" (Wiggins & Trapnell, 1996), is clearly at variance with the 
position of McCrae and Costa just described. Earlier, Wiggins (1991) had argued 
that the broad philosophical concepts of agency and communion (Bakan, 1966) 
should serve as conceptual coordinates for the understanding and measurement of 
interpersonal behavior. More recently, Wiggins and Trapnell (1996) argued that 
agency and communion may be clearly identified in the two higher-order factors of 
the five-factor model (FFM). Our argument was based, in part, on Digman's (1997) 
extensive series of higher-order factor analyses of the FFM in which he indicated the 
fruitfulness of such an interpretation. This work may be summarized as demonstrat- 
ing that love (communion) and status (agency) are indeed fruitful concepts for un- 
derstanding enduring dispositions in individuals. 


Conclusions 


As will surely be apparent to any reader of this book, there are multiple approaches 
to the measurement and interpretation of the five-factor model of personality. What 
differentiates the IASR-B5 from other FFM inventories is its emphasis upon the 
interpersonal domain and the psychometric precision provided by circumplex as- 
sessment. To the rich literature of correlates of the FFM, the interpersonal circum- 
plex approach adds a similarly rich literature of correlates of the interpersonal cir- 
cumplex (see Kiesler, 1996). The structure of the circumplex allows for simultane- 
ous examination of dominant and nurturant tendencies and in so doing provides not 
only dimensional, but also categorical information and an assessment of the rigidity 
of interpersonal expression (Wiggins, 1995). With this as its strong suit, the IASR- 
B5 might be preferentially employed when one is interested in the nature and char- 
acter of interpersonal interaction in functional or dysfunctional form. 
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Chapter 12 


The Big Five Marker Scales (BFMS) and the 
Italian AB5C taxonomy: Analyses from an 
etic-emic perspective 


Marco Perugini 
Lisa Di Blas 


Introduction 


In recent years several psycholexical studies to uncover the main personality factors 
have been undertaken in different languages and countries such as the USA, Ger- 
many, The Netherlands, Hungary, Turkey, Japan, and Korea (for recent reviews, see 
De Raad, 2000; Saucier, Goldberg, & Hampson, 2000). While these studies shared 
many commonalities, some procedural differences were also present. Italy repre- 
sented a unique situation: two independent psycholexical projects were conducted in 
the same language by using different approaches. These two studies have already 
been compared in some detail elsewhere (De Raad, Di Blas, & Perugini, 1998; Di 
Blas & Perugini, 2001). In this chapter, after a brief description of the two independ- 
ent projects and a comparison of their results, we merge the data. The final result is 
given in the form of a new adjective list offering a brief measure of the Big Five. 
This list, named the Big Five Marker Scales (BFMS), has two specific main fea- 
tures: a) it represents an optimal Big Five structure, with optimality meaning a facto- 
rially simple structure, and b) it can be used to map all other personality descriptive 
terms, henceforth providing a comprehensive taxonomy of personality descriptors in 
the Italian language. 
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The two Italian projects 


The first psycholexical study was conducted in Rome, the capital of Italy located in 
the center of the country, by Perugini and Caprara. Starting with an abridged dic- 
tionary, they finally selected a pool of 492 personality relevant-adjecti ves (Perugini, 
1993; Caprara & Perugini, 1994). The pool was chosen on the basis of lay judges' 
implicit conceptions of personality, a criterion already applied in the Dutch lexical 
project (Brokken, 1978). The 492 adjectives were then administered for a self-rating 
task to 274 participants. Factor analyses were performed on the data set, and results 
indicated five dimensions only approximately comparable to the American Big Five. 
In a second study, the set of adjectives was reduced to 285 by discarding adjectives 
with low communalities in the five-factor solution of the first study and adding a 
few adjectives. The 285 adjectives were administered to 961 participants, who pro- 
vided self- and peer-ratings. A five-factor solution again emerged, with factors 
called Extraversion/Energy, Conscientiousness, Quietness, Selfishness, and Con- 
ventionality. The first two closely resembled the Big Five factors Extraversion (I) 
and Conscientiousness (IID; of the other three, Quietness and Selfishness were 
shown to be rotational variants of the Big Five factors Agreeableness (II) and Neu- 
roticism (IV). Finally, Conventionality resembled more the fifth of the Dutch lexical 
dimensions rather then the fifth of the more common psycholexical Big Five (Intel- 
lect). 

The other psycholexical research was conducted in Trieste, which is located in 
the North-East of Italy, very close to Slovenia and Croatia. Differently from the 
Roman project, Di Blas and Forzi (1998) firstly reduced a large set of person adjec- 
tives on the basis of lay judges' implicit conceptions, and secondly they further se- 
lected the adjectives by applying the categorization system of the German lexical 
project (Angleitner, Ostendorf, & John, 1990). Di Blas and Forzi collected 427 self- 
ratings and 277 other ratings on the final set of 314 adjectives, and performed factor 
analyses on the pooled data sets. Results showed a stable three-factor solution, re- 
sembling the Big Three of Peabody and Goldberg (1989). A five-factor solution 
yielded Conscientiousness, Assertiveness, Sociability, Quietness/Placidity, and Ten- 
dermindedness. These findings were replicated in subsequent taxonomic studies. In 
particular, Di Blas and Forzi (1999) used a broader set of personality adjectives, and 
performed factor analyses on 369 self-ratings: The five-factor solution again did not 
replicate exactly the Big Five, whereas the three-factor solution reproduced the Big 
Three. This solution was further developed in an Abridged Big Three Circumplex 
(AB3C) configuration (see also Di Blas, Forzi, & Peabody, 2000). 

The two Italian taxonomic projects yielded seemingly inconsistent results. 
Caprara and Perugini (1994) argued for a cross-cultural replicability of the Big Five 
(after target rotation), whereas Di Blas and Forzi (1998, 1999) did not. How can 
these findings be reconciled? Can they be ascribed to procedural differences? De 
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Table 1. A representation of the five-factor structure from the pooled data sets of the Italian taxo- 
nomic projects 


Exuberant, extroverted, social, vivacious, Silent, introverted, shy, taciturn, reserved, 

cheerful, unconstrained solitary 

Peaceful, patient, calm, tranquil, tolerant, meek Irritable, aggressive, quarrelsome, 
domineering, choleric 

Precise, orderly,consistent disciplined, industrious, Unruly, disorderly, absent-minded, 

responsible inaccurate, uncautious 

Assured, resolute, strong, enterprising, decisive, Suggestible, timid, anxious, vulnerable, 

bold emotional, fragile 

Sensitive, altruistic, generous, sentimental, loyal, Insensitive, insincere, disloyal, ruthless, 

human greedy, perfidious 


Raad, Di Blas, and Perugini (1998) compared the two taxonomies. They used the 
Triestian 314-item set (Di Blas & Forzi, 1998, Study 1), and the Roman 260-item set 
(Caprara & Perugini, 1994, Study 2). First, they factor analyzed a pooled data set of 
1,664 ratings on 158 personality adjectives in common to the two projects, and 
found a stable five-factor structure which at best replicated the first four of the Big 
Five: Sociability (D, Placidity (ШЛУ), Conscientiousness (III), Self-Assurance 
(IV/II). The fifth factor was a blend of Nurturance and Integrity (Table 1). Analyses 
performed on the adjective sets specific to the two studies revealed that they covered 
comparable underlying broad dimensions, roughly dealing with factor I and a blend 
of factors III and V of the Big Five. These findings suggested that differences in 
selection procedures among the Italian studies are matters of emphasis in contents 
more than of structural discrepancies (see also De Raad, Perugini, Hřebíčková, & 
Szarota, 1998). In particular, it appeared that dissimilar selection criteria had an im- 
pact on the content of the fifth factor, which emerged as Conventionality in the Ro- 
man data and as Culture and Abilities in the Triestian data. 

More recently, Di Blas and Perugini (2001) compared larger sets of Roman and 
Triestian adjectives. In particular, they analyzed the Roman set of 492 terms 
(Caprara & Perugini, 1994, Study 1), and the Triestian set of 369 adjectives (Di Blas 
& Forzi, 1999, Study 1). The two sets had 250 adjectives in common, 119 were spe- 
cific to the Triestian study, and 242 to the Roman study. Extensive factor analyses 
spanning different factorial solutions were performed on the adjectives in common 
(765 self-ratings), and coefficients of congruence indicated three to five stable di- 
mensions across subsamples of participants. Again, the five-factor solution did not 
reproduce the common Big Five; rather, the five dimensions were comparable to 
those found by De Raad ег al. (1998). As regards the specific sets, the five-factor 
solutions appeared comparable in terms of content (basically, they could be inter- 
preted as facets of Extraversion, Agreeableness, and Conscientiousness), with one 
main exception again concerning the fifth factor: in the Roman data it was defined 
as Conventionality and in the Triestian data as Culture. 

In sum, these findings indicate the peculiar composition of the five-factor solu- 
tion in the Italian language with respect to the common Big Five. Together with 
some other studies (e.g., Almagor, Tellegen, & Waller, 1995), they counterbalanced 
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the enthusiastic claims that the Big Five are the universal dimensions of personality 
(McCrae & John, 1992). 


The Big Five in the Italian language 


A "standard" Big Five taxonomy failed to emerge in the Italian language from an 
emic perspective, that is, from studies of personality lexicon conducted from within 
the cultural system (Berry, 1969). It might be argued that the Big Five failed to 
emerge because they were not well represented in the Italian studies. However, this 
is not the case. In fact, Di Blas and Perugini (2001) could develop a brief adjectival 
Big Five measure from a set of 250 adjectives in common to the two Italian projects 
through applying an etic-emic strategy. First, they followed an etic approach, and 
selected 82 adjectives already classified as Big Five markers in previous lexical 
studies (De Raad, Hendriks, & Hofstee, 1994; Goldberg, 1992; Hofstee, De Raad, & 
Goldberg, 1992; Perugini & Leone, 1996; Trapnell & Wiggins, 1990). Then, they 
followed an iterative procedure, ideally resulting in a Big Five factor solution with 
orthogonal dimensions, each loaded by 10 markers. These socalled Big Five Marker 
Scales (BFMS) are presented in Table 2. Note that the fifth dimension encompasses 
adjectives mainly referring to divergent (Creativity) rather than convergent thinking 
(Intellect). Briefly, the Big Five do not represent particularly well the етіс five- 
factor structure in Italian, but they can be recovered fairly well using ad hoc 
procedures. 

This is the case for other languages as well: The Big Five are not neatly repro- 
duced in emic studies, but Big Five markers can be rather easily selected (e.g., 
Boies, Lee, Ashton, Pascal, & Nicol, 2001; Hahn, Lee, & Ashton, 1999). Do these 
findings support the cross-cultural generality of the Big Five? Have taxonomers 
found the basic coordinates of a cross-cultural map of personality lexicon? Probably 
the answer to these questions might be No if we adopt the view that the coordinates 
should correspond to these lexical dimensions emerging consistently from independ- 
ent emic studies conducted in many different languages (John, Goldberg, & Angleit- 
ner, 1984). In fact, it has been already shown that the Big Five do not fully satisfy 
rigorous psychometric tests of cross-cultural stability (De Raad er al., 1998; Hof- 
stee, Kiers, De Raad, Goldberg, & Ostendorf, 1997; for a different view, see Saucier 
et al., 2000). The answer, however, may be Yes if we decide on a priori or conven- 
tional coordinates of such an international taxonomy. Many studies have shown that 
the Big Five can be recovered adopting a more etic-oriented perspective. Therefore, 
а convenient strategy might be to find an optimal Big Five structure that can be used 
as a grid on which other aspects of personality can be located and understood, 
whereby for optimal is especially meant a factorially simple structure (cf. Perugini, 
1999; Perugini, Gallucci, & Livi, 2000). 

In this chapter we look at our data from this latter perspective, and present a con- 
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Table 2. Big Five Marker Scales (BFMS) in the Italian language 


| Extraversion 
Extroverted (Estroverso) 
Warm-hearted (Espansivo) 
Open (Aperto) 
Exuberant (Esuberante) 
Vivacious (Vivace) 

|| Agreeableness 
Altruistic (Altruista) 
Agreeable (Disponibile) 
Generous (Generoso) 
Sympathetic (Comprensivo) 
Hospitable (Ospitale) 

Ш Conscientiousness 
Precise (Preciso) 
Orderly (Ordinato) 
Diligent (Diligente) 
Methodical (Metodico) 
Conscientious (Coscienzioso) 

IV Emotional Stability 
Self-assured (Sicuro) 
Serene (Sereno) 

Calm (Calmo) 
Impassive (Impassibile) 
Jealous (Geloso) 

V Creativity 
Creative (Creativo) 
Imaginative (Fantasioso) 
Original (Originale) 
Ingenious (Ingegnioso) 
Poetic (Poetico) 
Intuitive (Intuitivo) 
Intelligent (Intelligente) 
Rebellious (Ribelle) 


Reserved (Riservato) 
Shy (Timido) 

Silent (Silenzioso) 
introverted (Introverso) 
Reserved (Chiuso) 


Egoistic (Egoista) 
Revengeful (Vendicativo) 
Cynical (Cinico) 
Egocentric (Egocentrico) 
Suspicious (Sospettoso) 


Untidy (Disordinato) 
Inconstant (Incostante) 
Careless (Impreciso) 
Careless (Sbadato) 
Rash (Incosciente) 


Nervous (Nervoso) 
Anxious (Ansioso) 
Emotional (Emotivo) 
Susceptible (Suscettibile) 
Touchy (Permaloso) 


Superficial (Superficiale) 
Obtuse (Ottuso) 


tribution in the development of such a conventional Big Five structure of personality 
adjectives. 


The Big Five Marker Scales 


The main aim of this chapter is to present the Big Five Markers Scale (BFMS), an 
adjective list composed of 50 adjectives that have been specifically selected in order 
to optimize the simplicity of the five-factor solution based on the combined data of 
the two Italian psycholexical studies (Di Blas & Perugini, 2001). In the original 
study, the markers were shown to have satisfying structural properties: the average 
value of the Kaiser's index of factorial simplicity was .86 (Kaiser, 1974); the aver- 
age inter-scale correlation was .14 (absolute values); Everett's (1983) generalizabil- 
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ity coefficient showed values ranging from .96 to .99 across the Roman and Tries- 
tian data sets. In this chapter we will present some additional data and review the 
psychometric properties of the BEMS. Furthermore, we will present a comprehen- 
sive Italian adjective trait taxonomy, which can be useful for comparisons with tax- 
onomies developed in other languages as well as for measuring specific aspects of 
personality within the Italian context. We arranged the whole set of personality- 
adjectives selected in the Italian studies (611 terms) according to an Abridged Big 
Five Circumplex structure (AB5C; cf. Hofstee et al., 1992). The coordinates of this 
АВ5С space were based on the factorial space created by the BEMS. 


Sample 


The psychometric properties of the BFMS were assessed on a data set of 1,029 self— 
ratings provided by 668 females (average age = 23.3, SD = 7.7), and 331 males (av- 
erage age = 26.6, SD = 10.4). Thirty participants had missing information. Partici- 
pants were instructed to rate on a 7-point scale (from 1 = not at all, to 7 = very 
much) the extent to which an adjective can be applied to him/herself. Note that the 
data set comprehends 765 self-ratings already used by Di Blas and Perugini (2001) 
to develop the BFMS, as well as 264 new self-ratings collected alongside other Big 
Five measures. In particular, 92 participants completed the NEO-PI-R (24 males and 
68 females; average age = 23.8, SD = 8.0), 94 completed the Big Five Question- 
naire-BFQ (14 males and 80 females; average age = 20.9, SD = 3.0), and 78 com- 
pleted both the Five Factor Personality Inventory-FFPI and the International Per- 
sonality Item Pool-IPIP (31 males and 45 females; average age = 26.3, SD = 11.2). 


Structural validity 


Table 3 presents the varimax rotated five-factor solution of the Big Five markers 
resulting from 1,029 self-ratings. Data were ipsatized before factoring in order to 
remove idiosyncratic components in the use of rating scales. The first ten eigenval- 
ues were 6.34, 5.57, 3.48, 3.08, 2.24 (41.4 per cent of the total variance explained), 
1.52, 1.34, 1.18, 1.06, and 1.05. A major jump between the fifth and sixth compo- 
nent suggested that a five-factor solution was adequate. An inspection of the factor 
loadings (Table 3) indicated that the factors were defined by the expected markers 
with medium to high loadings. Most adjectives presented a marker index higher than 
40 (Perugini ег al., 2000), and high Kaiser's factorial simplicity values, indicating 
that the markers were factorially simple and that the resulting factor structure ap- 
proximated a simple structure very well. The only exception was for a few adjec- 
tives measuring the fifth dimension, Creativity. 

When Oblimin rotation was applied to the five-factor solution, factor correlations 
ranged between -.11 to .17. Therefore, the BFMS approximated well orthogonality, 
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Table 3. Five-factor solutions of the BF markers (after varimax rotation) 


т шош EE шш л у у  — Ww M BEN 
Extroverted (Estroverso) .79 -.05 ‚15 .07 .05 .66 .74 .95 
Warm-hearted (Espansivo) 27:32:05 31 ‚04 ‚09 .64 .59 .83 
Open (Aperto) .69 03 .28 .08 .00 ‚56 .58 .85 
Exuberant (Esuberante) .60  -.10 .03 .04 .24 .43 .53 .84 
Vivacious (Міуасе) .59  -.05 .18 .19 .16 ‚45 255 277) 
Reserved (Riservato) 2:59 0020 .09 .19 .00 .43 .54 .81 
Shy (Timido) -.69  -.03 ОН 13 -.09 „55 ‚63 187 
Silent (Silenzioso) -72  .06 .08 213 -.01 .54 .69 .96 
Introverted (Introverso) -79 -.01 -.04 ‚01 .00 .63 .79 .99 
Reserved (Chiuso) -80 -.01 -08 -.04 -.03 „65 .78 .98 
Altruistic (Altruista) .09  -.02 272 02 .02 252) ‚71 .99 
Agreeable (Disponibile) .16 .00 .69 .06 .00 .50 .65 .95 
Generous (Сепегоѕо) .08 -.09 .66 .04 .04 .45 .65 .97 
Sympathetic (Comprensivo) .02 .06 .59 .07 .04 .36 .58 .97 
Hospitable (Ospitale) .17 -04 .53 .10 .04 35 .50 .85 
Suspicious (Sospettoso) -.04 00 -.41 -.23 -.12 123 37 273 
Egocentric (Egocentrico) 15 - 12 -42 06 ЛӘ .24 .40 .74 
Cynical (Cinico) -08 -.10 .-.43  .09 .06 .21 .42 .88 
Revengeful (Vendicativo) .12 -08 -.48 -.11 -.04 .26 ‚47 ‚89 
Egoistic (Egoista) -06 -08 -.58  .01 -.04 :35 297, .96 
Precise (Preciso) -.07 .74  .00 .14 .10 59 .70 .93 
Orderly (Ordinato) -.06 .73 .06 ail -.03 155 .71 .97 
Diligent (Diligente) -.07 .67  .21 .04 -.07 .50 .61 .90 
Methodical (Metodico) -.15 .64  -.01 .08 -.11 .46 .61 .89 
Conscientious (Coscienzioso) -.10 .60 .27 .03 -.06 .46 .52 .78 
Rash (Incosciente) .08 -.47 -.09 .07 EU 225 .46 .94 
Careless (Sbadato) -.09  -.60 Sud 10 -.02 139 .59 192 
Careless (Impreciso) -06  -.63  .11 -.04 -.18 .44 .59 .90 
Inconstant (Incostante) .01 -.64 -.09 -.16 -.05 .45 .61 .91 
Untidy (Disordinato) .01 -.71 .06 .00 .00 .50 .70 1.0 
Self-assured (Sicuro) .29 .20  -.04 .61 als ‚52 51 .72 
Serene (Sereno) .20 .18 ВОС 59—14 Je .49 .66 
Calm (Calmo) -.26 .17 {25 aei =T 25/1 .51 .66 
Impassive (Impassibile) -.17 .01 -.22 .44 -.07 .28 .40 .69 
Jealous (Geloso) .07 -.04 -.11  -.33 -.14 А15 232 273 
Touchy (Permaloso) 02 -.06 -.21 -.46  -.21 .30 42 ‚70 
Susceptible (Suscettibile) 04 -.11  -.13 -.53 -.18 .34 .50 .83 
Emotional (Emotivo) 4.15 -.02 .26 -.58 -.04 ‚43 2511 ‚78 
Anxious (Ansioso) -.14 .08 04  -.65 -.09 ‚45 .62 .94 
Nervous (Nervoso) -.06 -.02 -.17 -.66 -03 > .47 62 .93 
Creative (Creativo) 241 .05 .07 .05 .74 257. .72 .96 
Imaginative (Fantasioso) „17 ‚14 11 -.04 ‚68 .52 ‚64 ‚89 
Original (Originale) .21 .08 .00 .14 61 ‚44 .56 ‚85 
Ingenious (Ingenioso) .01  -.06  -.08 223 259 .41 .58 .85 
Poetic (Poetico) -.03 .07 12006215 .46 ‚15 ‚44 ‚85 
intuitive (Intuitivo) .03 -.04 .03 .14 ‚41 ‚19 .39 ‚88 
Intelligent (Intelligente) .04 -.07  -.09 .28 .38 223 .32 .63 
Rebellious (Ribelle) .20 72800-72388 227, .26 „12 .28 
Superficial (Superficiale) -.03 .34 -.04 215 -.27 aet] .19 235 


Note: MI = marker index; FSI = factorial simplicity index 
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which constituted the main psychometric rationale we used to develop the АВ5С 
taxonomy. The simplicity was formally assessed by calculating correlations between 
scale scores and factor scores (cf. Ten Berge & Knol, 1985). The observed values 
were .99, 195, .99, .98, and .96 for I to V of the Big Five, respectively, and off- 
diagonal values ranged from -.025 to .041. These results demonstrate the overall 
simplicity of the factor structure. The stability of the structure was ascertained by 
calculating Everett's generalizability coefficient on two random subsamples: the 
values were .99, .99, .99, .98, and .97 for factor I to factor V, respectively. 


Descriptive statistics and reliability 


Descriptive statistics and reliability values are reported in Table 4. Separate statistics 
were calculated for gender and age (below and above 25 years old). Table 4 shows 
that there were no significant differences for the Extraversion and the Creativity 
scales. Significant differences were found for the Conscientiousness scale with older 
people describing themselves as more conscientious than younger participants, and 
for the Emotional Stability scale with females having higher average score than 
males. For the Agreeableness scale differences were significant both for gender and 
age with females being more agreeable than males, and older participants more 
agreeable than younger. Internal consistency values varied from .73 for Creativity to 
.89 for Extraversion. From an applied point of view, we suggest to use orthogonal- 
ized factor scores instead of raw scores to measure the five dimensions. These can 
be obtained by multiplying the z-scores on each factor, which can be calculated by 
using the values given in Table 4, with the factor weights reported in Appendix 1. 


Convergent validity 


To assess convergent validity, factor scores on the BFMS were correlated with 
scores on four well-known Big Five questionnaires that are extensively described 
elsewhere in this book and therefore presented very briefly in this context. 


NEO-Personality Inventory-Revised 


The NEO-PI-R is a widely used measure of the Big Five dimensions of personality, 
developed by Costa and McCrae (1985, 1992), and containing 240 items, 48 for 
each of the five main scales: Neuroticism, Extraversion. Openness to Experience, 
Conscientiousness, and Agreeableness. Each scale has six facets. For the present 
sample, reliability values (Cronbach's alphas), were .93, .89, .88, .88, and .92 for 
Neuroticism to Agreeableness, respectively; they ranged from .64 to .86 for twenty- 

six subscales, and were .38 for O6-Values, .47 for O2-Aesthetics, .50 for A6- 
Tendermindedness, and .55 for ES5-Excitement Seeking. The five dimensions were 
not completely orthogonal: Neuroticism correlated -.48 with Extraversion, and -.42 
with Conscientiousness; Extraversion correlated .39 with Openness to Experience. 
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Table 4. Descriptive statistics and reliability values for the Big Five Marker Scales 


Scales 
2 > 
n [727 
5 5 = E 
5 £ | я = 
© 2a 5 £ = 
> @ = o > 
3 i Я 5 | 
E c Ф 
5 2 5 5 5 
Males 
Young (М = 220) Mean 43.18 49.08 42.49 39.42 50.12 
5р 9.98 8.68 10.65 9.13 Viet 
Adult (№ = 107) Mean 44.78 50.92 49.30 39.83 48.72 
5р 10.44 7.66 9.21 8.98 7.56 
Females 
Young (М = 554) Mean 45.17 52.63 44.48 33.19 49.33 
SD 12731 7.53 10.91 8.76 7.65 
Adult (N = 106) Mean 44.73 54.41 50.18 36.29 47.99 
SD 10.87 7.65 9.65 9.68 8.56 
ANOVA 
Age [2800755 F = 50.36** 
Gender F = 30.67** F = 44.36“ 


Alphas .89 .76 .86 .77 273 


Note: *p < .01, ** р < .001 


Big Five Questionnaire 


The BFQ is a Big Five measure developed in the Italian language (Caprara, Barba- 
ranelli, Borgogni, & Perugini, 1993). It is composed of 132 items, 20 for each of the 
five main scales: Energy, Friendliness, Conscientiousness, Emotional Stability, and 
Openness. Twelve items form the Lie scale. Each Big Five scale has two facets. For 
the present sample, Alpha values were .81, .72, .82, .87, and .76 for Energy to 
Openness, respectively, and they ranged from .57 (Openness to Experience) to .83 
(Impulse Control) for the ten facets. For the Lie scale, the internal consistency was 
.62. The inter-correlation matrix revealed the relative orthogonality of the five main 
scales; only the fifth dimension presented significant correlations (at p = .01) of .33 
with Conscientiousness, and .30 with Friendliness. 


Five Factor Personality Inventory 


The FFPI is a Big Five questionnaire containing 100 brief statements (Hendriks, 
1997; Hendriks, Hofstee, & De Raad, 1999). It was developed on the basis of previ- 
ous lexical studies in Dutch culminating in the Abridged Big-Five Dimensional Cir- 
cumplex (AB5C) model (De Raad, Hendriks, & Hofstee, 1992, 1994; Hofstee et al., 
1992). The statements were selected across the ABSC space in order to represent the 
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five dimensions also in their nuances. Differently from the NEO-PI-R and BFQ, the 
fifth factor is defined as Autonomy. For the present sample, alpha reliabilities were 
91, .82, .86, .93, and .85 for Extraversion (I) to Autonomy (V), respectively. Only 
one of the correlations between the factors (using orthogonalized factor scores as 
recommended by Hendriks, 1997) was significant (-.29 between Conscientiousness 
and Autonomy). 


International Personality Item Pool 


The IPIP is a Big Five inventory described by Goldberg (1999), and it includes 100 
items. In the present study, we used the shorter 50-item version. For the present 
sample, alpha reliabilities were .85, .85, .83, .87, and .75 for the first to the fifth of 
the Big Five, respectively. The inter-correlation matrix did not present any signifi- 
cant value, with correlations ranging from -.11 to .23. 

Correlations between the BFMS scales and the other Big Five measures are re- 
ported in Table 5. Consider first the NEO-PI-R. Evidence of convergent validity was 
found for all five Big Five Marker scales, with each factor correlating mainly with 
the corresponding BFMS factor. A secondary correlation of -.39 between BFMS-III 
and NEO-PI-R Openness to Experience emphasizes the content of the Italian Con- 
scientiousness scale, which refers to order and dutifulness without indulging in fan- 
tasies. 

As regards the correlations of the BFMS with the NEO-PI-R facets, results pro- 
vide evidence of convergent validity for twenty-two of the thirty facets, having their 
highest correlation with the expected BFMS scale. A clear convergent validity was 
observed for ВЕМ$-Ш, with values ranging from .46 to .80; Order (C2), Self- 
Discipline (C5), and Dutifulness (C3) were shown to characterize the Italian Consci- 
entiousness scale. Although more modest in magnitude (.44 to .59), convergent cor- 
relations were found for BFMS-II; among the NEO-PI-R Agreeableness facets, 
Modesty (A5) and Compliance (A4) appeared to be peripheral to BFMS Agreeable- 
ness, being also correlated to BFSM-I and -IV, respectively. Correlations higher 
than .44 were observed between BFMS-I and NEO-PI-R Extraversion facets; 
Warmth (E1) was located between BFMS-I and -11; and Excitement Seeking (E5) 
was unrelated to the BEMS-I, whereas it did correlate significantly with ВЕМ$-Ш (- 
.36). If the value of -.28 found between BFMS-III and Impulsiveness (№5) is also 
considered, then it can be concluded that impulse control variables are related to the 
third of the Big Five in the Italian context. As regards BFMS-IV, convergent valid- 
ity was partially supported, with low levels of Anxiety (N1) and Angry-Hostility 
(N2) being primarily related to the Italian Emotional Stability scale. For the fifth of 
the Big Five, convergent values ranged from .30 to .41. They suggest that creativity 
is relatively marginal to the definition of NEO-PI-R Openness, which is conceived 
in terms of readiness to appreciate rather than produce art, new ideas and values. 

Concerning the relations with the BFQ measure, convergent validity was sup- 
ported for the first, third and fourth of the Big Five, both for the general factors and 
the specific facets. A correlation of .51 was observed between BFMS-II and BFQ- 
Friendliness, with Politeness (F2) being also related to BFMS-IV and -V. As regards 
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the fifth dimension, a correlation of .24 (p = .03) was observed between the two 
measures: the relation can be mainly ascribed to the facet Openness to Culture (.30). 
However, the values clearly indicate that BFMS and BFQ represent different aspects 
of the debated fifth of the Big Five. Correlations observed between BFMS and IPIP 
and FFPI questionnaires are also reported in Table 5. Results provide clear evidence 
of convergent validity for all factors. However, the FFPI-Autonomy scale was found 
to have a sizable secondary correlation with BFMS-I (.32) 

In general, results support convergent validity of the BFMS. The composition of 
the fifth factor of the BFMS is peculiar, and it seems to be more linked to the pro- 
ductive aspects of creativity and intelligence rather than to openness to new experi- 
ences or culture. 


An AB5C taxonomy of Italian personality trait adjectives 


The BFMS provided the conventional Big Five map on which all other adjectives 
could be located. In particular, the BFMS were related with the 200 adjectives in 
common to the two Italian lexical projects (765 self-ratings), the 242 adjectives 
unique to the Roman study (275 self-ratings), and the 119 adjectives specific to the 
Triestian study (491 self-ratings), for a total of 561 adjectives. Following the АВ5С 
model of Hofstee er al. (1992), the five-factor space was ordered into ten circum- 


Table 5. Significant correlations observed between BFMS (factor scores) апа NEO-PI-R (raw 
scores), BFQ (T-scores), IPIP (raw scores), РЕР! (orthogonalized factor scores). 


Big Five Marker Scales 


| | | IV ү 

NEO-PI-R 

Extraversion (E) .74 

Agreeableness (A) .76 

Conscientiousness © .83 

Neuroticism (N) -.36 -.57 

Openness (О) -.39 .48 

Warmth (E1) .47 .48 

Gregariousness (E2) .45 

Assertiveness (E3) .65 -.31 

Activity (E4) 152 .33 

Excitement Seeking (E5) -.36 

Positive Emotions (E6) .48 

Trust (A1) .47 | .28 

Straightforwardness (A2) 294 

Altruism (A3) .59 

Compliance (A4) -.31 .56 .43 

Modesty (A5) -.41 .44 

Tendermindedness (A6)* .45 

Competence (C1) .52 


(Continued) 
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Table 5. continued | || 111 IV V 


Order (C2) .80 
Dutifulness (C3) .66 
Achievement Striving (C4) .60 
Self-Discipline (C5) .75 
Deliberation (C6) .46 


Anxiety (N1) * -.61 
Angry Hostility (N2) -.40 -.54 
Depression (М3) -.36 -.47 
Self-Consciousness (М4) -.37 -.34 
Impulsiveness (М5) -.28 

Vulnerability (N6) -.28 -.52 


Fantasy (01) -.35 .39 
Aesthetics (02)* :32 -.31 .30 
Feelings (03) .41 
Actions (04) -.36 Bi 
Ideas (05) 

Values (О6)* -.30 


BFQ 
Energy (E) .72 
Friendliness (F) .51 
Conscientiousness (С) .67 
Emotional Stability (ES) .68 
Openness (О) (.24) 
Lie (L) 


Dynamism (E1) .71 

Dominance (E2) .48 

Cooperativeness (F1) .46 

Politeness (F2) .39 .34 -.30 
Scrupulousness (C1) -.36 .68 

Perseverance (C2) .46 

Emotion control (ES1) .71 

Impulse Control (ES2) .48 

Op. to Culture (01) 

Op. to Experiences (02) .30 


IPIP 
Extraversion .77 
Agreeableness .66 -.32 
Conscientiousness .80 
Emotional Stability .69 
Intellect .53 


ЕЕР! 
Extraversion E73 
Agreeableness .60 
Conscientiousness .83 


Emotional Stability .67 
Autonomy .32 .39 


.Note: Correlations are significant at p < .01. The highest values are given in bold. * Alpha reliability 
values ? ‚50. 
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plexes, each formed by pitting each of the Big Five factors against one another. The 
two highest observed correlations between a given term and the BF marker factors 
served first to assign the term to one of the ten circumplexes and second to calculate 
the angular location (AL) and the vector length (VL) of a given adjective within the 
pertaining circumplex. For each adjective, AL and VL were calculated as follows: 


с 
AL = ARCTAN| —- 
bf 


ES 


VL = Jef? « bf?) 


with bf, and bf, being the correlations of a given item with the the two Big Five co- 
ordinates (y axis and x axis) of the assigned circumplex. After each adjective is as- 
signed to one of the ten bi-dimensional spaces, and its VL and AL are calculated, the 
next issue is how to divide each bi-factorial space. Two main alternatives are avail- 
able (cf. Perugini, 1999): eight sectors of 45 degree each, lending to a total of 25 
bipolar spaces and twelve sectors of 30 degrees each, for a total of 45 bipolar spaces. 

As shown in Table 6a, however, in this latter case, which is the option adopted by 
the Dutch-English АВ5С model (cf. Hofstee er al., 1992), a relevant number of bi- 
polar spaces was poorly represented. Indeed, twenty four spaces (that is, 27 per cent) 
contained between О and 2 adjectives, often with a VL lower than .30 (in total, 160 
of the 561 adjectives had a VL «.30). Therefore, we opted for the first alternative of 
using larger sectors of 45 degrees, giving a total of 25 bipolar spaces. Table 6b re- 
ports frequencies of adjectives classified in each octant. 

Figure 1 presents illustrative adjectives (selected among those with the highest 
vector lengths) of the ten two-factor spaces, and plots of all the adjectives assigned 
to each circumplex. 

This ABSC taxonomy of Italian personality adjectives provides interesting infor- 
mation. First, of the ten bi-dimensional spaces, those for BF II x III, III x IV, II x IV, 
and I x IV included the largest numbers of adjectives, and the space formed by BF 
IV x V included the smallest number of terms. These findings are comparable to 
those reported for both Dutch and American-English taxonomies (De Raad et al., 
1994; Hofstee et al., 1992). Second, the cells are filled by adjectives consistent in 
terms of content, although some of them were selected from the independent Tries- 
tian and Roman data sets. Third, of the possible 25 bipolar facets, 18 are well- 
defined: theoretically opposite octants (e.g. I+II+ vs. I-II-) contain adjectives oppo- 
site in terms of content, and include at least three items with VL > .30. For example, 
of the nine possible facets for factor I, seven are well-defined: Talkative-Silent, So- 
ciable-Aloof, Mild-Domineering, Prudent-Rash, Unconstrained-Insecure, Quiet- 
Frenetic, Brilliant-Boring. The bipolar sectors I+V- versus I-V+, JH-IV+ versus 
III-IV-, and IV-V+ versus 1V+V- are insufficiently represented. Of the remaining 
cells, only one pole is well-defined (e.g. I-III- but not I+III+). It should also be noted 
that IV-- and V- octants are poorly represented as well. This result is consistent with 


294 Big Five Assessment 


Table 6. AB5C ordering of personality adjectives in Italian. 
________ у I 


A. Number of adjectives assigned to the 30? sectors of the ten circumplexes. 


|+ 1+ + IV+ V+ l- ll- ШЕ М- V= 

5 0 

5 1 

1 2 

/ 1 

0 / 

11 9 

15 1 

IH- 7 20 2 
3 


, 


Note: The adjectives had a primary correlation on the column Big Five dimension, and a 
secondary correlation with the row Big Five dimension 


B. Number of adjectives assigned to the 45? sectors of the ten circumplexes. 


1+ il+ 11+ IV+ V+ I-Ie ill- IV- V- 


previously developed АВ5С taxonomies in Dutch and American-English. Finally, 
the findings illustrated in Figure 1 allow emphasizing further consistencies between 
АВ5С structures developed in different languages. There appear to be striking simi- 
larities between the continua observed by Hofstee ег al. (1992), De Raad et al. 
(1994) and those presented here for the spaces formed by BF Іх П, Іх IV, Ix V, II x 
IV, II x V, III x V. Interestingly, similarities can be found not only for the well-filled 
circumplexes (e.g., I x 11), but also for those that are less filled, such as BF III x V 
(industrious and refined are located between I+ and V+ both in American-English 
and Italian taxonomies). More in detail, it can easily be noted, for example, that the 
Dutch, American-English, and the present ABSC structure locate optimism- and 
forcefulness-related adjectives in the I+IV+ sector; peacefulness-related terms in the 
П+ГУ+ section; stability and persistence-referring adjectives in the Ш+ТУ+ sector; 
simplicity in the II- V- cell; dependence in the IV-V- octant, and so forth. Some dif- 
ferences emerged as well. For example, the fifth positive pole is largely represented 
by adjectives suggesting divergent thinking, whereas this cell was not represented in 
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———————————————— 


Talkative (.62) 
Free and easy (.61) 
Chatty (.57) 


ИП of the ВЕМ5 


Domineering (.42) 
Overbearing (.41) 
Leader (.40) 
Despotic (.40) 
Authoritarian (.37) 


Tyrannical (.46) 
Haughty (.41) 
Dictatorial (.38) 
Bad (.37) 
Prevaricator (.36) 


Unosociable (.54) 
Cold (.42) 
Bashful (.39) 
Distrustful (.33) 
Skeptical (.31) 


Taciturn (.72) 
Solitary (.49) 
Obscure (.46) 


ШИ of the BEMS 


Talkative (.62) 
Free and easy (.61) 
Chatty (.57) 


Rash (.45) 
Hotheaded (.44) 
Cheeky (.43) 
Clownish (.40) 
Dissolute (.38) 


Inconclusive (.60) 
Careless (.60) 
Discontinuous (.58) 
Chaotic (.55) 
Bugler (.55) 


Lazy (.45) 
Awkward (.43) 
Clumsy (.36) 


Inactive (.34) 
Inconsistent (.32) 


Taciturn (.72) 
Solitary (.49) 
Obscure (.46) 


Figure 1a. The the bi-dimensional spaces formed by the BFMS 
highest vector length are reported. 


Sociable (.64) 
Communicative (.59) 
Friendly (.59) 

Jovial (.57) 
Cordial (.57) 


Humble (.52) 
Kind (.47) 
Corteous (.47) 
Conciliatory (.46) 
Honest (.45) 


Mild (.43) 
Modest (.37) 
Chaste (.34) 


Organized (.69) 
Accurate (.69) 
Constant (.65) 
Disciplined (.62) 
Thorough (.56) 


Prudent (.51) 
Serious (.49) 
Moderate (.47) 


Reflective (.46) 
Customary (.39) 


factors: Illustrative items with the 
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Talkative (.62) 
Free and easy (.61) 
Chatty (.57) 


IIV of the BEMS 


Unconstrained (.60) 
Optimistic (.59) 


Impulsive (.37) 
Wayward (.33) 
Frenetic (.33) 


Cheerful (.55) 
Easy (.52) 
Forceful (.44) 


Depressed (.59) 
Self-pyting (.54) 


Imperturbable (.59) 


Hysterical (.52) 
Vulnerable (.50) 
Plaintive (.45) 


Quiet (.48) 
Tranquil (.47) 
Insensitive (.43) 
Unresponsive (.39) 
Phlegmatic (.34) 


Insecure (.63) 
Pessimistic (.56) 
Sad (.53) 
Hesitant (.52) 
Gloomy (.52) 


Taciturn (.72) 
Solitary (.49) 
Obscure (.46) 


Talkative (.62) 
Free and easy (.61) 
Chatty (.57) 


ЏУ of the ВЕМ5 


Sparkling (.52) 
Overwhelming (.49) 
Whimsical (.48) 
Brilliant (.46) 
Enterprising (.45) 


Inventive (.68) 
Polyhedric (.40) 
Shrewd (.31) 


Slow (.37) 


Awkward (.51) 
Boring (.45) 
Passive (.44) 
Inactive (.42) 
Submissive (.36) 


Thoughtful (.34) 
Meditative (.32) 


Taciturn (.72) 
Solitary (.49) 
Obscure (.46) 


Figure 1b. Bi-dimensional spaces. 
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ИЛИ of the BEMS 


Dreamer (.33) 


Inconclusive (.60) 
Careless (.60) 
Discontinuous (.58) 
Chaotic (.55) 
Bugler (.55) 


Humble (.52) 
Kind (.47) 
Corteous (.47) 
Conciliatory (.46) 
Honest (.45) 


Unruly (.52) 
Inhuman (.46) 

Individualist (.44) 
Immoderate (.36) 
Lustful (.36) 


ПЛУ of the BEMS 


Sensitive (.47) 
Hypersensitive (.47) 
Oversensitive (.47) 


Sentimental (.38) 
Ingenuous (.33) 


Depressed (.59) 
Self-pyting (.54) 
Hysterical (.52) 
Vulnerable (.50) 
Plaintive (.45) 


Irritable (.55) 
Irascible (.51) 
Quick-tempered (.51) 
Choleric (.50) 

Surly (.39) 


Figure 1c. Bi-dimensional spaces. 


Strict (.43) 
Calculating (.41) 
Severe (.39) 
Inflexible (.35) 
Intransigent (.34) 


Tyrannical (.46) 
Haughty (.41) 
Dictatorial (.38) 
Bad (.37) 
Prevaricator (.36) 


Humble (.52) 
Kind (.47) 
Corteous (.47) 
Conciliatory (.46) 
Honest (.45) 


Patient (.49) 
Peaceful (.45) 
Tolerant (.45) 
Calm (.44) 
Trustful (.43) 


Pitiless (.46) 
Indifferent (.35) 


Icy (.32) 


Tyrannical (.46) 
Haughty (.41) 
Dictatorial (.38) 
Bad (.37) 
Prevaricator (.36) 


Organized (.69) 
Accurate (.69) 
Constant (.65) 
Disciplined (.62) 
Thorough (.56) 


Imperturbable (.59) 


298 Big Five Assessment 


Humble (.52) 
Kind (.47) 
Corteous (.47) 
Conciliatory (.46) 
Honest (.45) 


ШУ of the BEMS 


Simple (.48) 
Docile (.37) 


Malleable (.34) 
Servile (.33) 
Apt to give in (.30) • 


Inventive (.68) 
Polyhedric (.40) 
Shrewd (.31) 


Slow (.37) 


Couldn't care 


less person (.4 
р (47) Narcissistic (.31) 


Stinghy (.40) 
Hostile (.37) 
Prejudiced (.33) 10 
Tyrannical (.46) 


Haughty(.41) 
Dictatorial (.38) 
Bad (.37) 
Prevaricator (.36) 


ШЛУ of the BMFS 


Organized (.69) 
Accurate (.69) 
Constant (.65) 
Disciplined (.62) 
Thorough (.56) 


Steady (.63) 
Assertive (.60) 
Stable (.57) 
Sensible (.56) 
Cautious (.56) 


Apprehensive (.50) 


Depressed (.59) 
Self-pyting (.54) 
Hysterical (.52) 

Vulnerable (.50) 
Plaintive (.45) 


Imperturbable (.59) 


Restless (.60) 
Unstable (.59) 
Neurotic (.57) 
Muddler (.57) 
Tormented (.57) 


Inconclusive (.60) 
Careless (.60) 
Discontinuous (.58) 
Chaotic (.55) 
Bugler (.55) 


Figure 1d. Bi-dimensional spaces. 
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Organized (.69) 
Accurate (.69) 
Constant (.65) 
Disciplined (.62) 
Thorough (.56) 


Ш/У of the BEMS 


Obedient (.43) 
Habit-loving (.40) 
Parsimonious (.36) 


.59 Industrious (.59) 
.31 Refined (.31) 


Inventive (.68) 
Polyhedric (.40) 
Shrewd (.31) 


Slow (.37) 


Good-for-nothing (.54) 
Inattentive (.52) 
Negligent (.48) 
Heedless (.44) 

Idle (.44) 


Imaginative (.45) 
Eccentric (.41) 
Transgressor (.41) 
Unforeseeable (.40) 
Theatrical (.34) 


Inconclusive (.60) 
Careless (.60) 
Discontinuous (.58) 
Chaotic (.55) 
Bugler (.55) 


IV/V of the BEMS 
Imperturbable (.59) 


Bright (.56) 
Gifted (.42) 
Bold (.39) 
Sharp (.37) 
Shrewd (.34) 


Inventive (.68) 
Polyhedric (.40) 
Shrewd (.31) 


Slow (.37) 


Fearful (.42) 
Weak (.41) 
Influenceable (.39) 
Suggestible (.36) 
Dependent (.36) 


Depressed (.59) 
Self-pyting (.54) 
Hysterical (.52) 

Vulnerable (.50) 
Plaintive (.45) 


Figure 1e. Bi-dimensional spaces. 
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Dutch; conversely, the negative pole of the fifth factor was well filled in Dutch but 
not in Italian. Aggressiveness is a relevant component of the first of the Big Five in 
American but not in Dutch or Italian. Depression and anxiety represent Emotional 
Instability in Italian, whereas sensitiveness-related terms define the pure IV- sector 
in Dutch, and moody and possessive characterize the same cell in American-English. 


Big Five for ever? 


When the structure of personality-referring adjectives is analyzed in the Italian lan- 
guage from an emic perspective, findings consistently show that the Big Five are 
only partially replicated. Here, we analyzed a larger pool of adjectives (N = 611) 
than those used in previous studies (Caprara & Perugini, 1994; Di Blas & Forzi, 
1998, 1999) since we merged the data of the two Italian lexical projects and ana- 
lyzed them from an emic-etic perspective. We focus this final paragraph on two 
points. 

First, whereas our approach follows an emic-etic perspective, it is not meant to be 
a full-fledged application of the emic-etic methodology. We developed a five-factor 
taxonomy whose coordinates were determined a priori as the Big Five. In particular, 
we first assessed some psychometric properties of the BFMS, then we used the re- 
sulting Big Five dimensions as coordinates of an AB5C taxonomy, and located 561 
personality adjectives within this space. The АВ5С taxonomy developed using the 
BFMS factors as coordinates revealed clear cross-cultural consistencies with other 
taxonomies. It is in this sense therefore that our study follows an emic-etic perspec- 
tive. This approach to study personality-related adjectives in Italian has shown that 
the Big Five can be replicated in this language, whereas emic studies have demon- 
strated that the Big Five structure does not emerge in its common form (Caprara & 
Perugini, 1994; De Raad et al., 1998; Di Blas & Forzi, 1999). 

How, then, can we reconcile these findings? We think that the preceding findings 
may perhaps appear as paradoxical at first glance, but they actually are quite con- 
sistent and perhaps reveal something more general about the Big Five. We argue that 
the emic lexical structures simply reflect the largest areas of the ABSC space. For 
example, the first component in the Italian emic studies was Conscientiousness: the 
ABSC ordering of the Italian adjectives shows that the bipolar facets IIL-IV 4 
Ш-У-, ШЊИ-Ш-, and IHIZ/III- include the largest number of terms. Similarly, 
the component Placidity versus Irritability, emerging as the fourth factor in both the 
Roman and the Triestian study, is defined by adjectives assigned to the very large 
IL-IV e/II-1V- cell. Therefore, when data are analyzed from an етіс perspective, the 
main axes appear to be positioned in correspondence of the most filled areas of ad- 
jectives. Interestingly, plots of the bi-dimensional spaces formed by the Big Five 
factors in different languages show that the so-called mixed sectors are often more 
represented and better defined than the so-called pure sectors. This is true for the 
first and second factor, and it is particularly striking for the fourth and fifth factor. 
These latter factors appear to be systematically more defined by mixed sectors rather 
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than pure ones. If "the degree of representation of an attribute has some correspon- 
dence with the general importance of the attribute" (Saucier & Goldberg, 1996, p. 
26), then quantitative similarities are worth to be emphasized across taxonomies of 
different languages, because they suggest what the most important perceived indi- 
vidual differences across cultures are. In other words, the numbers of personality 
adjectives assigned to the ABSC sectors indicate that personality attributes different 
from the Big Five (or some of them) may be basically important according to every- 
day personality theories. For example, this is the case for sections located between 
BF I and IV (I+IV+ vs. I-IV-, that is, Assurance vs. Insecurity), BF I and II (I+II+ 
vs. 1-П-, that is, Friendliness against Interpersonal coldness), BF II and III (П+11+ 
vs. П-Ш- or Reliability vs. Unreliability) or still between BF II and IV (II+IV+ vs. 
II-IV-, that is Placidity vs. Irritability). Of course, this is not the only possible inter- 
pretation of the results and our speculation need independent empirical support in 
future studies. 

Second, our results do not solve the question about the universality of the Big 
Five. By definition, any study in a single country cannot be sufficient to establish 
universality. However, they allow us to elaborate on this issue. Here, we showed that 
a Big Five factor structure and an АВ5С taxonomy can be developed from the joint 
Italian data set. The Big Five factors can be reliably assessed in the Italian context 
using the BFMS, and the ABSC taxonomy can be very valuable in understanding 
other dimensions which can be located in this space. The Big Five, however, 
emerged mainly when forcing the variables to conform to a Big Five structure. 
Therefore, these findings provide evidence that the Big Five are among the most 
important categories for ordering personality attributes. However, they are not nec- 
essarily the universal coordinates of personality unless we so decide. Moreover, the 
distribution of the adjectives in the ABSC cells — consistently with ABSC ordering 
of personality adjectives in Dutch and in English — indicates that other categories, 
mainly consisting of mixed factors when adopting a Big Five conventional map, 
may be equally if not even more important for lay people. Some of these "mixed" 
factors may well show to be of more practical importance than their pure counter- 
parts, and this might lead to a revision on what kind of conventional map is more 
convenient in a given language or across languages. The Big Five represents a very 
significant achievement of personality psychology and a point of no-return in per- 
sonality research. But, ultimately, time will tell whether they will maintain this 
status in the new century or rather will represent a starting point from which other 
more convenient taxonomies will be developed. 
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Appendix 1. BFMS orthogonalized factor scores (OFS) from raw data after standardi- 
zation (SRS) (see Table 4). 


OFSIBF x 1.051*SRS I - .077*585 П + .069*SRSIII — .057*SRSIV -  .149*SRS V. 
OFSHBF = - .075*SRSI + 1.022*585 П - .042*SRS Ш — .067*SRSIV -  .024*SRS V. 
OFS III BF = .071*SRS I - .043*SRSII + 1.062*SRS Ш - .196*SRSIV + .012*SRS V. 
OFS IV BF = - .059*SRSI - .070*SRSII - .197*SRS Ш + 1.066*SRSIV + .015*SRS V. 
OFSVBF = .147*SRS I - .024*SRSII + .OlI*SRSII + .015*SRSIV + 1.034*SRS V. 


Chapter 13 


Japanese Adjective List for the Big Five 


Shigeo Kashiwagi 


Introduction 


In Japan, Aoki (1974) was the first to adopt the psycholexical approach to study per- 
sonality traits. He published his Dictionary for personality traits containing about 
2,400 Japanese adjectives describing traits, which were classified by Aoki into seven 
bipolar categories, namely politeness versus selfishness, gentleness versus stubborn- 
ness, sociability versus shyness, kindness versus cool-heartedness, activity versus 
impatience, steadiness versus carelessness, and brightness versus stupidity. Under 
the influence of psychological studies both in the United States and in Europe (e.g., 
John, 1990; Hofstee, De Raad, & Goldberg, 1992), several Big Five related investi- 
gations were performed based on Aoki's "dictionary". These were Kashiwagi, 
Wada, and Aoki (1993), Kashiwagi and Yamada (1995), Wada (1996), Kashiwagi 
and Wada (1996), and Kashiwagi (1999). All were published in Japanese, with 
only brief English summaries. These adjective based studies were performed in or- 
der to see whether the hypothetical Big Five model would hold in Japanese. 

Kashiwagi et al. (1993), for example, confirmed the relevance of the Big Five by 
applying different methods of rotation. Kashiwagi et al. (1993) had a group of 583 
university students (348 men and 235 women) who were requested to respond to a 
list of 200 adjectives, using a seven-point scale. Those 200 adjectives had been se- 
lected by Wada (1991) as being suitable for describing the Big Five concepts from 
the 2,400 adjectives in Aoki's dictionary. Kashiwagi er al. (1993) first applied the 
incomplete orthogonal procrustes factor rotation method of Browne (1972) to the 
200 adjectives. 

With this method it is possible to rotate the factorial structure towards a target 
structure that specifies the assumed theoretical model (i.e., Big Five), under the con- 
straints of factor orthogonality. Browne's method has been shown to perform well 
under these conditions (see Kashiwagi er al., 1993; Kashiwagi & Yamada, 1995; 
Kashiwagi & Wada, 1995; Kashiwagi, 1995). Moreover, if used iteratively, it allows 
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to select items that best approximate the target simple structure. 

Upon applying this rotation method, 131 adjectives turned out to provide the best 
tentative approximation of the Big Five; 69 adjectives were deleted. Next, the same 
factor rotation was applied to this reduced set of 131 adjectives, and subsequently to 
the smaller sets of 110 and 91 adjectives, until the Big Five assumption could be 
confirmed more satisfactorily. Finally, the 91 adjectives were accepted as the 
optimal set. More details concerning the computational aspects of this method of 
Browne can be found in the paragraphs "Structure of trait adjectives" and "Concur- 
rent validity: the Big Five versus the psychoanalytic concepts" of this chapter. 

Then, in order to polish up the orthogonal solution in the sense of a Big Five sim- 
ple structure, and to get to know the correlations among the primary axes, the in- 
complete oblique factor rotation of Jóreskog (see Mulaik, 1972) was applied. The 
method of Jóreskog may be considered as an oblique case of the incomplete or- 
thogonal procrustes factor rotation method of Browne (1972), therefore relaxing the 
orthogonality constraints and allowing for correlated factors. Some computational 
aspects of the method of Jóreskog are described in the paragraph "Big Five and in- 
terdependence" of this chapter paper. 

The incomplete oblique procrustes factor rotation method of Jóreskog yielded 
possibly the first Big Five Adjective structure in Japan. The numbers of adjectives 
for Factor I (Extraversion) through Factor V (Openness to Experience) were 24, 21, 
17, 10, and 19, respectively. The &-coefficients for the five factors were .92, .88, 
.87, .84 and .88, respectively. The correlations among the primary axes indicated 
that the pairs Factor I (Extraversion) and Factor V (Openness to Experience), Factor 
V (Openness to Experience) and Factor II (Agreeableness), and Factor II (Agree- 
ableness) and Factor III (Conscientiousness) were each mutually correlated, which 
suggested that Factor IV (Neuroticism) might be relatively independent of the other 
four factors. 

Following the work of Piedmont, McCrae, and Costa (1991), Wada (1996) and 
Kashiwagi and Wada (1996) also investigated the Big Five structure concurrently 
through joint factor analyses on data from both the Chiba University Personality 
Inventory (CUPI; Yanai, Kashiwagi, & Kokushou, 1987) and the previously men- 
tioned list of 91 Japanese adjectives. The CUPI has twelve factors or scales each 
consisting of ten items. The factors or scales are named Neuroticism, Depression, 
Inferiority Complex, Extraversion, Activity, Stand Out, Enterprising, Aggressive- 
ness, Agreeableness, Empathetic, Conscientiousness, and Endurance, respectively. 
The CUPI contains 120 Japanese items of the sentence type, such as “Others often 
tell me that I look always alive and vivid" for the factor Extraversion, "I like to help 
even an unknown person when in trouble" for the factor Agreeableness, "I always 
make plans before I perform anything" for the factor Conscientiousness, “1 worry 
whenever J fail even in trivial matters” for the factor Neuroticism, “I like to come up 
with new ideas that no one has proposed before" for the factor Enterprising, and the 
like. In particular, Wada (1996) applied the Promax method, which is an oblique 
procrustes factor rotation method through a reference structure to a primary pattern 
based on the Varimax solution as an initial factor position (see Mulaik, 1972), to the 
factors and scales of both the CUPI and the Big Five based 91 adjectives. They con- 
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firmed that the scales Neuroticism, Depression, and Inferiority Complex of the 
CUPI correspond to the trait Neuroticism in the 91 adjectives list. Furthermore, they 
confirmed that Extraversion, Activity, and Stand Out of the CUPI correspond to 
Extraversion in the 91 list, Enterprising of the CUPI to Openness to Experience in 
the 91 list, Aggressiveness, Agreeableness, and Empathetic of the CUPI to Agree- 
ableness in the 91 list, and Conscientiousness and Endurance of the CUPI to Consci- 
entiousness in the 91 list, respectively. 

Moreover, Kashiwagi (1999) demonstrated that the factor structure of the Tokyo 
University Egogram (TEG; Suematsu, Wada, Nomura, & Tamura, 1995), which was 
constructed on the basis of psychoanalytic concepts, could be circumplexically or- 
ganized in terms of the Big Five. He did this through the incomplete orthogonal pro- 
crustes factor rotation method of Browne (1972). Some parts of this study are dis- 
cussed in the paragraph "Concurrent validity; the Big Five versus psychoanalytic 
concepts" of this chapter. 

Following this brief history of the findings, the possibility of arriving at a more 
refined Japanese Adjective List will be investigated. To that end, the development of 
trait-facet systems and the issue of using single adjectives versus sentences will be 
discussed. 


The construction of a trait-facet system 


The construction of a hierarchical Big Five structure containing both traits and fac- 
ets can be approached in different ways. The most common approach is the one used 
in the NEO-PI-R (Costa & McCrae, 1991) which may be considered as one of the 
few broadly used and standardized personality inventories of the sentence type. It 
has Extraversion (E), Agreeableness (A), Conscientiousness (C), Neuroticism (N), 
and Openness to Experience (0) as the Five Factor labels. Each factor scale consists 
of six facets and each facet is described by eight items. Thus, the total number of 
items in the inventory amounts to 240. On data obtained for the 240 items, a Vari- 
max rotation was applied and then the Big Five structure was identified in the sense 
of simple structure. A similar Big Five structure was also found after Varimax rota- 
tion of the thirty facets. However, as the number of trait factors was the same in both 
solutions, it may be said to be rather intuitive to discriminate between a trait factor 
and its six related facets. In other words, as both solutions may be said not to be dif- 
ferent in the sense of hierarchical levels, the discrimination between a trait factor 
and the related six facets may not be easy. For example, the trait factor Extraversion 
may not be discriminated from its facet Activity only through both Big Five solu- 
tions unless the hierarchical relations concerning trait factors to facets were not as- 
sumed before starting the analysis. 

Hofstee et al. (1992) adopted the so-called Abridged Big Five Dimensional Cir- 
cumplex (АВ5С) approach to seek the hierarchical relations between trait factors 
and facets in 636 adjectives. They estimated the factor loading matrix on the basis of 
100 adjective markers of the Big Five. In other words, they obtained the rotated 
factor loading matrix approximately by making use of the principle A = k ZF, 
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where A, К, Z’, and F respectively are the factor loading matrix to be estimated ар- 
proximately, a scalar, the transposed standardized score matrix, and the factor score 
matrix based on the marker items. The adjectives in the approximately obtained or- 
thogonal factor loading matrix were each grouped into the facets to be psychologi- 
cally interpretable through plotting them graphically onto their circumplex planes. 

Although the ABSC approach may be considered an improvement in comparison 
to the approach followed by Costa and McCrae, in the sense that it can discriminate 
a trait factor from its related facets more objectively, the grouping of factor loadings 
into a two dimensional or circumplex space may make it to be efficient to select ad- 
jectives with the help of inspection. In other words, the speed in the work for selec- 
tion may be accelerated with the help of the assumption before starting the analysis 
concerning hierarchical relations of trait factors to facets. 

A possible new type of approach to discriminate a trait factor from its related fac- 
ets is discussed here for the construction of personality inventories. The Varimax 
and the Orthomax criteria, instead of the Varimax alone, are applied to a specific 
principal factor matrix in order to attain both the trait factors and the facets in an 
objective way. In other words, the Varimax is applied to attain the Big Five trait 
factors, and next one of the Orthomax criteria, which contain Quartimax, Varimax, 
Equamax, Parsimax, Factor-parsimony and others (see Mulaik, 1972), is applied to 
attain the related facets in each trait factor. Users can easily choose one of the Or- 
thomax criteria in common standard statistical packages such as SAS. The main rea- 
son for applying the Orthomax in stead of the Varimax to define facets is that the 
Varimax may provide sometimes biased results when the number of items or the 
number of factors or both are so large (see Kashiwagi, 1965; Hofstee er al., 1992; 
Goldberg, Sweeney, Merenda, & Hughes, 1996). 

Take, for instance, the CUPI, described briefly above. Yanai er al. (1987) failed 
at an initial stage to attain the Varimax solution for 120 variables in twelve factors in 
the sense of simple structure. In other words, the Varimax did not work well and did 
not provide an interpretable answer from a psychological point of view. However, in 
order to get around this kind of trouble, a more psychologically interpretable and 
satisfactory answer in the sense of simple structure could be attained by applying the 
Equamax as one of the Orthomax criteria. Although the uses of the Orthomax crite- 
ria, instead of Varimax, may not be so popular at present, the application of the 
Varimax is not always appropriate for the reason given above. Moreover, even if 
the assumption for the Big Five based on the vast number of variables is confirmed, 
the facets in a number of factors larger than five may not always be interpretable 
when the Varimax is applied. In general, unless Varimax solutions can meet satis- 
factorily the user's pre-assigned assumption, the applications of the Orthomax crite- 
ria instead of Varimax should be considered. It may very well be possible to attain 
more satisfactory solutions from a theoretical point of view. 
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Adjectives versus sentences 


We distinguish two types of personality inventories, one consisting of adjectives and 
the other consisting of sentences, henceforth respectively called the adjective type 
and the sentence type inventories. On the basis of common sense, the sentence type 
inventory may be preferred over the adjective type because adjectives are more ab- 
stract and inexact than sentences both in evaluating and in describing personality 
traits (see Widiger & Trull, 1997). In fact, almost all the widely used personality 
inventories, such as the NEO-PI-R, belong to the sentence type inventory. Goldberg 
(1999) asserted that the personality inventories of the sentence type are preferable, 
and consequently he presented the International Personality Item Pool (IPIP) con- 
sisting of short behavior-descriptive sentences. He used an item format that is more 
contextualized and thus longer than trait adjectives, yet more compact and thus 
shorter than the items in most personality inventories. The Groningen personality 
team of Hofstee, De Raad, and Hendriks has been the major proponent of this item 
format, and they have used it to develop an initial pool of Dutch items which might 
cover many of the facets of the Big Five structure (Hendriks, 1997). Based on their 
approach, Goldberg (1999) has worked further with their itempool and has proposed 
the IPIP in English. 

Although adjectives may be more inexact and less contextualized than sentences 
in describing and evaluating personality traits, the advantage of personality invento- 
ries of the adjective type 15 that they are less time-consuming and more easily under- 
stood by foreigners. Moreover, studies by Piedmont er al. (1991) and Kashiwagi 
(1999) show that, also from a test theoretic point of view, personality inventories of 
the adjective type need not always be inferior to those of the sentence type. For ex- 
ample, the «-coefficient brought about when using adjectives may be larger than the 
one brought about when using sentences, with the same number of variables for fac- 
ets or traits. From a factor analytic point of view, however, there are, to my knowl- 
edge, very few examples satisfying the trait-facet system in the personality invento- 
ries of the adjective type. Therefore, it is of great interest to develop further a per- 
sonality inventory of the adjective type, which can be evaluated for both traits and 
facets simultaneously. This is one of the main aims of this paper. 


Method 


Two kinds of factor analytic studies were conducted. The first analysis (Study-1) 
involves the proposal of a Japanese Big Five Adjective List consisting of 120 adjec- 
tives, not to be confused with the CUPI (Chiba University Personality Inventory) 
which is also composed of 120 items. 

The second analysis (Study-2) involves a hierarchical evaluation form based on 
traits and facets, using 105 items which were selected from the Adjective List 
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achieved in Study-1. The data that were used for both analyses were obtained from 
218 subjects, all university students among whom 33 foreign students. Especially 
considering the foreign students, who may not be so familiar with the Japanese 
words, an easy format was used. 


Study-1 


Material and procedure 


The 200 adjectives selected by Wada (1991) from Aoki's "dictionary" were used. 
The subject were asked to provide self-ratings using a 3-point scale, with the answe- 
ring possibilities “Yes”, “Questionable”, and "No", scored as 1, 0, and -1, respec- 
tively. 


Results 


Structure of trait adjectives 


The self-ratings were factored following the principal axes method. Five factors 
were extracted, which were rotated according to Varimax. As some items turned out 
to be rather complex in terms of a Big Five simple structure, the number of items 
was decreased in the four steps described below, down to 120 through applying suc- 
cessively the factor analyses based on the numbers of the items 200, 180, 160, and 
120. 

At the first step, Varimax was applied to the 200 adjectives, and 20 items that 
were complex in the sense of Big Five simple structure were deleted. At the second 
step, analyses were done on the remaining 180 adjectives. A target matrix was con- 
structed in such a way that the Varimax factor loadings less than and larger 
than .35 were set to be zero and unknown, respectively, and the incomplete orthogo- 
nal procrustes factor rotation method of Browne (1972) was applied. In this way the 
squared sum of the factor loadings corresponding to the zero elements in the target 
matrix was minimized in the sense of least squares with the orthogonal constraints 
of the factors rotated. Then each rotated value was converted into zero or unknown, 
yielding a new target matrix. At the third step, through the comparison between the 
initial targets and the newly obtained ones in terms of their mutually corresponding 
converted elements, the incomplete orthogonal procrustes factor rotation method 
was applied to the 160 adjectives, after deleting the twenty items that did not match 
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to each other in terms of their mutually corresponding elements in both targets. At 
the fourth and final step, after deleting forty items on the basis of the comparison of 
both targets, the incomplete orthogonal procrustes factor rotation method of Browne 
(1972) was applied to the 120 adjectives. At this stage, the initial and obtained tar- 
gets agreed perfectly to each other in terms of their mutually corresponding con- 
verted elements, and also they were very satisfactory in the sense of the Big Five 
simple structure. Therefore, the 120 adjectives were finally retained. 

The Varimax rotation was then applied to the 120 adjectives attained through the 
preceding four steps, the solution of which is presented in Table 1. The Japanese 
adjectives are presented in their alphabetic or Rooma-Ji form, and the translations in 
English are presented as well. The author translated the adjectives into English 
through consulting the works of Bond, Nakasato and Shiraishi (1975), Isaka (1990), 
Hofstee er al. (1992), and Goldberg er al. (1996). The study by Hofstee et al. (1992) 
was translated into Japanese by Murakami and Murakami (1999) and the study by 
Goldberg ег al. (1996) was translated into Japanese by the present author. 

In Table 1, the symbols between parentheses are labels for facets (e.g., E2, El for 
Sociable), to be explained in Study-2. For each trait-factor, the trait-adjectives are 
grouped into those with a positive loading and those with a negative loading, ar- 
ranged from largest to lowest with regard to their absolute values. The values over | 
.35| are shown in boldface, and, if more factor loadings over | .35| are found for a 
trait-adjective, they are shown together with the trait factor symbol in parentheses. 
"Humorous", for example, loads on both the trait factors E and O with the values .38 
and .36, respectively. The numbers of positively loading trait adjectives and nega- 
tively loading ones are, respectively, 19 and 10 for Extraversion, 16 and 4 for 
Agreeableness, 9 and 12 for Conscientiousness, 1 and 25 for Neuroticism, and 24 
and 0 for Openness to Experience. The corresponding «-coefficients are .93, .88, 
.86, .91, and .90, respectively. 


Big Five and interdependencies 


In order to investigate the degree of interdependencies among the Big Five trait fac- 
tors, the incomplete oblique procrustes factor rotation method of Jóreskog (Mulaik, 
1972, pp. 314-318) was applied. According to this method, the squared sum of the 
elements in the obtained factor rotation matrix corresponding to the zero elements in 
the target matrix is minimized so as to attain indirectly the primary pattern through 
the reference structure. And, although this method may provide rather oblique solu- 
tions, they are not affected at their initial factor positions. Each row vector in the 
target matrix corresponding to the adjectives belonging to the trait factor Extraver- 
sion in Table 1 is given by [? 0 0 0 0]. Those for the trait factor Agreeableness are 
given by [0 ? 0 0 0], those for Conscientiousness аге [00 ? 00], for Neuroticism [0 
0 0 ? 0], and for Openness to Experience [0 0 0 0 ?]. The question mark “?” and the 
number “0” respectively represent the unknown and zero elements in the target ma- 
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Table 1. Varimax solution for the Big Five in the 120 Japanese Adjectives List. 


Adjectives 
Factor | (Extraversion: E) 
Shakoutekina 
Akarui 
Katsudoutekina 
Gaikoutekina 
Kaikatsuna 
Youkina 
Tomodachinoooi 
Sekkyokutekina 
Hitonattsukkoi 
Hanashizukina 
Aisonoyoi 
Genki 
Koudou-han-inohiroi 
Riidaashippunoaru 
Kigaruna 
Hyoukinna 
lyokutekina 
Hitotoisshogasuki 
Yuumoanoaru 


Mukuchina 
Uchikina 
Hitogirai 
Uchitokenai 
Jimina 
Buaisona 
Ishihyoujishinai 
Naiseiteki 
Fusagigachi 
Kodokuna 


Factor | (Agreeableness: A) 
Onkouna 

Odayakana 
Hankoutekidenai 
Ryoushintekina 

Onwa 

Shinsetsuna 

Yasashii 
Kyouryokutekina 
Hitonoyoi 
Reigitadashii 
Kinonagai 
Hitoatarinoyoi 
Kandaina 
Omoiyarinoaru 
Nasakebukai 
Jikochuushintekidenai 
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Sociable 
Cheerful 
Active 
Extraverted 
Merry 
Effervescent 
Good-mixer 
Aggressive 
Affable 
Talkative 
Amiable 
Energetic 
Social 
Vibrant 
not-Reluctant 
Jocular 
Enthusiastic 
Gregarious 
Humorous 


Quiet 

Shy 
Unsociable 
not-Affable 
Sober 
not-Amiable 
Aloof 
Introspective 
Depressed 
Alone 


Cordial 
Mild 


not-Antagonistic 


Honest 
Gentle 
Generous 
Affectionate 
Helpful 
Good-natured 
Polite 

Patient 
Agreeable 
Charitable 
Sympathetic 
Compassionate 


not-Self-centered 


Facets 


(E2, E1) 
(E4, E2, E1) 
(E3, E2) 
(E3) 
(E3, E2, E4) 
(E2) 
(E3, E1) 
(E3) 
(E2) 
(E4) 
(E1) 
(E3) 
(E2) 
(E4) 
(E4) 
(E2) 
(E3) 
(E4, N1) 
(E2, O1) 


(ЕТ, E2, E4) 
(E1, N1) 
(E1) 
(E1) 
(E1) 

(E1 
(E1) 
(E4) 
(E1) 


(A3) 
(АЗ 
(А2) 
(A1, A3) 
(A3, A1) 
(A1) 
(A1) 
(A1) 
(A1) 
(A1, C2) 
(A3, A2) 


(A3) 
(A1) 
(A1) 
(A2) 


Loadings 


275 


.44 


.38 (.36; 0) 


.67 
.63 
.56 


-.55 


.54 
.48 
.47 
.46 


.45 (-.35; N) 


.36 


(.35; E) 


(.42; E) 


Hankouteki 
Togenoaru 
Wagamamana 
Zukezukemonowoiu 


Factor Ill (Conscientiousness: C) 


Kichoumenna 
Sekininkannotsuyoi 
Rouwooshimanai 
Ganbaru 
Keikakuseinoaru 
Teineina 

Kinbenna 
Shinchouna 
Faitonoaru 


likagenna 
Keisotsuna 
Musekininna 
Akkippoi 
Keihakuna 
Ruuzuna 
Bukkirabou 
Kimagure 
Utsurigina 
Taidana 
Rakkantekina 
Ishinoyowai 


Factor IV (Neuroticism: N) 


Kuyokuyoshinai 


Bikubikusuru 
Kinochiisai 
Urotaeru 
Odoodosuru 
Douyoushiyasui 
Okubyouna 
Fuanninariyasui 
Dokyouganai 
Shinkeishitsuna 
Yuuutsuna 
Taninwokinisuru 
Awateyasui 
Hikantekina 
Mijimena 
Sabishigariya 
Shinpaishou 
Rettoukannotsuyoi 
Hazukashigariya 
Nayamigachi 
Kinchousuru 
Zaiakukannotsuyoi 


Antagonistic 
Harsh 
Selfish 
Abusive 


Neat 
Responsible 
Diligent 
Hardworking 
Systematic 
Respectful 
Industrious 
Cautious 
Spirited 


Disorganized 
Rash 
Irresponsible 
Weary 
Frivolous 
Unpunctual 
Blunt 
I\logical 
Fickle 
Negligent 
Optimistic 
Weak-will 


Unworried 


Fearful 
Timid 
Upset 
Fidgety 
Restless 
Cowardly 
Insecure 
not-Daring 
Nervous 
Gloomy 
Apprehensive 
Unstable 
Pessimistic 
Miserable 
Lonely 
Anxious 
Defensive 
Bashful 
High-strung 
Tense 
Self-critical 
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(A2, АЗ) 
(A2) 
(A2) 

(A2, 03) 


(C2) 
(C1, N3) 


(C2) 
(C2) 
(C2) 
(C2) 


(C1) 
(C1) 
(C1) 
(C1) 
(C1, A2) 
(C2) 


(C1) 
(C1) 


(C1) 


(N2, N1) 
(N1, N2) 
(N2) 
(N2) 
(N2) 
(N1, N2) 
(N2) 
(N1) 
(N3) 
(N3) 
(N2) 
(N2) 
(N1, N3) 
(N3) 


(N1, N3) 
(N1) 
(N3) 
(N1) 
(N3) 
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07 
-.41 


‚46 


-.68 
-.68 
-.68 
-.62 
-.61 
-.60 
-.59 
-.58 
-.58 
-.57 
-.53 
-.53 (-.35; C) 
-.53 
-.48 
-.47 
-.46 
-.46 
-.45 
-.44 
-.44 
-.43 
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Hitonoiinari 
Kizutsukiyasui 
Kigurounoooi 
Hunbetsunoaru 


Factor V (Openness to experience: O) 


Atamanokaitennohayai 
Omoitsukinoyoi 
Nouritsunoyoi 
Kitennokiku 
Yuunouna 
Atamanoyoi 
Aideanoyoi 

Tasaina 
Handannohayai 
Souzouryokunitonda 
Bitekikankakunosurudoi 
Nomikominohayai 
Dousatsuryokunoaru 
Dokusoutekina 
Chakusougaii 
Kannoii 

Kibinna 
Tayorigainoaru 
Dokuritsushita 
Rinkiouhenna 
Kanjounoyutakana 
Kidatenoyoi 
Yuuzuunokiku 
Tegatai 
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Dependent 

Vulnerable (N3) 
Worried (N3) 
Sensible 

Smart (02, O1) 
Insightful (01, 02) 
Proficient (02) 
Flexible (02, 03) 
Able (02) 
Intellectual (02) 
Intuitive (01, 02) 
Versatile (02) 
Clever (03, 02) 
Imaginative (01) 
Aesthetic (01) 
Wise (02) 
Perceptive (01) 
Creative (01) 
Witty (01) 
Brilliant (02) 
Prompt (02) 
Dependable (02) 
independent (03, E1) 
Accommodating (03) 
Artistic (01, N2) 
Soft-nearted 

Bright (03) 
Trustworthy 


Note: The adjectives are in Japanese Rooma-Ji and English 


.39 
.37 (.36; A) 
.35 
.32 


For space-saving purposes, the whole body of the primary pattern matrix is not 
presented, as the simple structure configuration of the Big Five is almost the same as 
the orthogonal one presented in Table 1. The matrix with correlations among the 
primary axes is presented in Table 2. Table 2 shows substantial correlations among 
the primary axes Factor I (Extraversion) and Factor V (Openness to Experience), 
Factor V (Openness to Experience) and Factor III (Conscientiousness), and Factor 
III (Conscientiousness) and Factor II (Agreeableness). Factor IV (Neuroticism) 
seems to be relatively independent of the other four trait factors. The relations for I- 
V, П-Ш, and the relative independence of IV were also found for the earlier de- 


scribed 91-list. 


Table 2. Correlation among Primary Axes for the Big Five in 120 Japanese Adjectives 


A C N 
E 724 14 29 
А 37 .05 
C -.30 
N 


Note: The values over |.35| are shown in boldface 
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Study-2 


In this second study we applied an Orthomax criterion, instead of Varimax, in order 
to define facets in an objective way. More specifically, the Parsimax, being one of 
the Orthomax criteria, was applied to the Japanese Adjective List with 120 
items, obtained through the Varimax in Study-1. However, when applying this me- 
thod, some items obtained in Study-1 may be excluded so that the final Big Five 
structure for the facets in Study-2 are satisfactory in the sense of the simple structu- 
re. An item belonging to a factor through Varimax in Study-1 may be regarded as 
belonging to a facet of another factor through the Parsimax in Study-2. "Spirited" 
for example, which loads .36 on Factor III (Conscientiousness) through Varimax 
(see Table 2), turns out to load on a facet of Factor I (Extraversion) through Parsi- 
max (See Table 3 ahead). 


Parsimax solution 


The Parsimax for the orthogonal factor rotation (see Mulaik, 1972, pp. 263-265) was 
applied to the same data as used in Study-1. Before starting the analysis, the number 
of the facets was assumed to be fifteen, in the hope that every facet might have eight 
items on average and that all the a-coefficients for the facets might be larger than 
.75. The computation was continued until each of the rotated factor loadings stabi- 
lized within the tolerance limit of |.001|. The number of the cycles for the iterations 
was 112. The solution is presented in Table 3, where, for space-saving purposes, 
only the factor loadings belonging to the facets for each trait-factor are presented. 
For example, for the trait factor E, only the factor loadings corresponding to the 
four facets El to E4 are presented; others are not. In each trait-factor, the adjectives 
evaluated positively are different from those evaluated negatively, and they are 
grouped into two parts. In each part, they are arranged from the largest to the small- 
est in terms of the absolute values with regard to factor loadings. 


Excluded adjectives 


The Big Five assumption, brought about through Varimax, may not always be re- 
produced perfectly in the fifteen facets brought about through Parsimax when the 
factor loadings over |.35| are considered. When comparing Tables 1 and 3, the fol- 
lowing fifteen adjectives may not always belong to the appropriate facets assumed 
previously following the Varimax procedure: Alone (Kodokuna) in E, Agreeable 
(Hitoatorinoyoi) in A, Spirited (Faitonoaru) in C, Responsible (Sekininkannotsuyoi) 
in C, Hard-working (Ganbaru) in C, Blunt (Bukkirabou) in C, Negligent (Taidana) 
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Table 3. Parsimax solution for the fifteen Facets in the 120 Japanese Adjective List 


| Extraversion El E — Ei E 
Amiable .54 .27 -.02 99, 
Jocular .05 .65 114 .03 
Humorous -.10 .61 12 .08 
Affable ‚23 193 .01 .34 
Effervescent ‚29 2527 .21 .30 
Sociable .39 .47 .21 233 
Social .06 .39 .23 .24 
Enthusiastic .04 .05 72 27 
Active .20 ‚42 .49 .23 
Aggressive .20 .26 .48 :23 
Energetic .26 .18 .45 .29 
Good-mixer .36 "3s .43 .20 
Merry .14 .39 41 237) 
Extraverted :25 .34 .37 232 
not-Reluctant .11 :22 -.04 .59 
Cheerful .35 .36 32 ‚46 
Gregarious .10 ‚17 ‚18 .42 (-.43; М1) 
Talkative :27 23/1 .09 .38 
Vibrant zail 215 .24 235 
not-Amiable -.78 -.14 -.03 -.06 
Depressed -.57 -.01 -.21 -.10 
Aloof -.52 -.16 -.14 -.01 
Quiet -.43 -.41 -.17 -.35 
not-Affable -.41 -.25 -.06 -.31 
Shy -.39 -.25 -.25 -.27 
Unsociable -.37 -.25 -.23 -.20 
Sober -.35 -.21 -.24 -.29 
Introspective -.13 -.17 -.19 -.35 
Alone(*) -.28 -.20 -.10 -.27 
ll Agreeableness A1 A2 A3 

Sympathetic .68 .15 .06 

Affectionate ‚67 .10 223 

Generous ‚62 .14 .32 

Good-natured .59 .04 .19 

Honest 53 ails .44 

Compassionate .46 .08 ‚11 

Helpful .39 .23 17 

Polite .36 oils} 227108375207) 
not-Antagonistic .12 .65 231 

not-Self-centered 27 157. .00 

Mild .24 .16 .68 

Cordial .30 „15 ‚66 


Patient -.15 .36 56 


Charitable 
Gentle 
Agreeable(*) 


Selfish 
Antagonistic 
Harsh 
Abusive 


Ш Conscientiousness 


Diligent 

Neat 

Cautious 
Respectful 
Systematic 
Industrious 
Spirited (*) 
Responsible (*) 
Hardworking (*) 


Rash 
Illogical 
Weary 
Fickle 
Irresponsible 
Disorganized 
Optimistic 
Frivolous 
Unpunctual 
Blunt (*) 
Negligent (*) 
Weak-will (*) 


ІМ Neuroticism 


Unworried (*) 
Timid 
Not-Daring 
Tense 
Pessimistic 
Bashful 
Cowardly 
Defensive 
Restless 
Unstable 
Upset 

Fearful 
Fidgety 
Insecure 
Apprehensive 
Miserable 
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15 14 49 
.38 .20 .44 
27 18 .30 (.36; E1; .35; E4) 
.06 -.62 22 

-.16 -.59 -.35 

-.20 -.50 2122 

-.21 -.36 -.11 (.42; 03) 

C1 C 
.35 ЛО (-.36; N3) 
10 62 
.08 a 
Az 54 
16 54 
ag 51 
ло .25 (.61; ЕЗ) 
33 18 
.20 11 (.64; ЕЗ) 
.66 -20 
.62 -102 
54 247 
54 |7 

2592 -.21 

-.51 -.30 

-.50 07 

-.47 -.16 
34 -.50 

-.30 -.34 (-.41; E1) 

-.19 -.30 

- 43 -.09 
N1 N2 N3 
32 29 23 

-.60 n - 17 

-.55 a2 18 

-.49 -.13 -.18 

-.49 -.14 -.40 

-.46 16 -.06 

-.43 -.43 -.18 

-.38 -.13 -.38 

«27 -.63 -.13 

-.11 -.55 -.29 

-.29 -.51 -.26 

-.43 -.47 -.29 

34 -.45 -.27 
30 -.44 -.34 
.26 -.41 -.20 

-.11 -.19 -.57 
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Vulnerable -.15 .03 -.54 
Nervous -.06 -.31 -.51 
Self-critical -.05 07 -.51 
Worried > -.05 214. - -.46 
Gloomy -.28 ' 3420 | -.44 
High-strung -.09 -.28 7 -,41 
Dependent (*) -.31 -.23 -.26 
Lonely (*) .19 -.33 -.28 
Anxious (*) -.20 -.28 -.34 
Sensible (*) -.11 -.16 -.10 
V Openness 01 02 03 
Aesthetic .69 .10 .06 
Imaginative .64 15 ‚00 
Creative .62 .02 AE) 
Intuitive 53 .38 -.03 
Perceptive .49 .33 ‚16 
Insightful .42 .42 .26 
Artistic ‚41 ‚08 .23 
Witty .41 ‚23 .09 (.42; E4) 
Wise -12 .67 .03 
Intellectual 22 ‚62 .09 
Smart .39 .61 228 
Proficient 15) .58 .30 
Able .34 .54 .12 
Versatile 27 .50 .20 
Prompt -.04 .47 -31 
Brilliant Fk ‚45 .13 
Flexible .29 .43 .41 
Dependable .07 .38 231 
Accommodating 215 .14 .61 
Clever ‚07 ‚43 .56 
Independent .01 .05 .56 
Bright .15 .08 .54 
Soft-hearted (*) ‚14 1% .14 (.41; A3) 
Trustworthy (*) .05 .24 .00 (.40; C2) 


Note: (*) indicates excluded adjectives in the Fifteen Facets. Loadings over |.35| are in botdface. 


in C, Weak-will (Ishinoyowai) in C, Unworried (Kuyokuyoshinai) in N, Dependent 
(Hitonoiinari) in N, Lonely (Sabishigariya) in М, Anxious (Shinpaishou) in М, Sensi- 
ble (Hunbetsunoaru) in N, Soft-hearted (Kidatenoyoi) in O, and Trustworthy 
(Tegatai) in O. These were excluded from the list of 120 Japanese adjectives; the 
remaining 105 Japanese adjectives were adopted for the fifteen facets. The excluded 
adjectives are indicated by (*) in Table 3. 

In Study-1, the largest absolute value with regard to the factor loadings was the 
most important guide to classify an adjective into a specific trait or facet. With such 
a criterion adopted here, five more adjectives (Gregarious in E, Polite in A, Abuse 
in A, Diligent in C, and Witty in О) should be excluded further (see Table 3); but 
this rather strict criterion was not followed. 
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The final list of 105 Japanese Adjectives 


For space-saving purposes, the results of the Varimax solution of the 105 Japanese 
adjectives are not presented. The results, however, strongly resemble the simple 
structure configuration of the Big Five. Some statistics for the Big Five scales are 
presented in Table 4. The «-coefficients are very satisfactory with the exception 
may be of the value .83 for C. The scores for the scales are each nearly normally 
distributed as indicated by their standard deviations, skewnesses, and kurtosises. 
Therefore, these Big Five scales may be considered as very suitable for measuring 
the Big Five traits. 


Fifteen facets and the Big Five 


For the set of 105 adjectives that were attained, the following facets for each trait 
factor were defined as the result of the Parsimax solution presented in Table 3. The 
trait factor E has four facets, called Friendliness (E1), Sociability (E2), Activity 
(ЕЗ), and Gregariousness (E4). The trait factor A has three facets, called Under- 
standing (Al), Generosity (A2), and Gentleness (A3). The trait factor C has two 
facets, called Reliability (C1) and Orderliness (C2). The trait factor N has three fac- 
ets, called Toughness (N1), Stability (N2), and Happiness (N3). Finally, the trait 
factor O has three facets, called Imagination (O1), Competence (O2), and Quickness 
(O3). The names for the facets are based on the studies of Goldberg (1999) and of 
Saucier and Ostendorf (1999). The three facets for Neuroticism are given names that 
are the reverse of the trait factor. Table 5 contains the facet-names together with 
their factor loadings. 

Except for the adjectives of the trait factor Neuroticism, Varimax was applied to 
the data based on the fifteen facets after reflecting the signs of the scores for the ad- 
jectives evaluated negatively in Table 3. The number of factors was of course five. 


Table 4. Statistics for 105 Japanese Big Five Adjectives 


Traits E А С N О 
number of items 28 19 15 21 22 
a-coefficient .93 .87 ‚83 .90 .90 
mean 5.37 6.54 -1.03 -6.50 1.03 
standard deviation 11.68 6.04 5:22 8.97 8.26 
skewness -.26 -.74 .09 .39 .08 


kurtosis -.54 .68 -.91 -.69 -.28 
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Table 5. Big Five Varimax solution in the fifteen facets of the 105 Japanese Adjective List (left 
part); Primary Pattern by the Incomplete Oblique Procrustes rotation method of Joreskog (right 


part) 
E А С N о Е А С N о 

| (Extraversion) 

E1 (Friendliness) .83 .04 .13 -.27 -.07 .94  -.01 20 -.16 -.33 

E2 (Sociability) .84  .07 -.10 .00 .22 .86 .05 -.08  .10  .06 

E3 (Activity) 282/10 0103 095025 .84  ..03 .06 | .02  .06 

Е4 (Gregariousness) .84  .00 -.16 -.09  .15 .87 -02 -13 .00 -.01 
|| (Agreeableness) 

A1 (Understanding) 232087752219 sith tk ‚28 |75 07 18 ОВ 

A2 (Generosity) -.16 .67 .31 -.16 -.24 -.13  .68 21 -.15  -.31 

АЗ (Сеп епе55) „00 .85 -.01 -.04 .18 -.13 ..90 -.22 -.04 ‚19 
{Н (Conscientiousness) 

C1 (Reliability) -.08 .18 .82 -.22 -.02 .00 .08 .85 -.12 -.18 

C2 (Orderliness) -.01 2159: 7395140733 -.01 .04 .74 .26 .23 
IV (Neuroticism) 

N1 (Toughness) 25 00 15—81 241 .16 -.05 .08 -.78 312 

N2 (Stability) 06 -.06 .24 -.85  .21 -04  -.13 .17  -.84 .15 

N3 (Happiness) .09  .14 -.27 -.80 .02 -.02  .17 -.39 ..86 -.02 
V (Openness to Experience) 

O1 (Imagination) .21 .03 .04 -01 .75 .02 -01  -.04 .04  .79 

02 (Competence) 2162052273025 1822279 -.03 -.04 21 -10 079 

03 (Quickness) 10 .10 -.02 -.20 ‚77 -ЛА .07 -16 -.17 85 


Note: The values over |.35| are shown in boldface. 


The result is presented in the left half of Table 5. They are neatly organized in terms 
of Big Five simple structure. In order to see the degree of the interdependencies 
among the Big Five and to be able to compare them to the results of Table 2, the 
incomplete oblique procrustes factor rotation method of Jóreskog was applied to the 
same data. The row vectors [?,0,0,0,0], [0,?,0,0,0], [0,0,?,0,0], [0,0,0,7,0], and 
[0,0,0,0,?] in the target matrix are given for the facets of the traits Extraversion, 
Agreeableness, Conscientiousness, Neuroticism, and Openness to Experiences, re- 
spectively, where the question mark “?” and the zero "0" represent the unknown and 
zero elements. The result is presented in the right half of Table 5. Although some 
complexity appeared in facet N3 (Happiness) in the primary pattern, the results are 
not really different from those in the left part of this table. However, when the cor- 
relations among the primary axes, as presented in Table 6, are compared to those in 
Table 2, only the linear relation of the trait factor E to the trait factor O turned out to 
be substantial. 
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Table 6. Correlations among Primary Axes for the Big Five in 105 Japanese Adjectives 
rA ———————— NN — ЕЕЕ 
A 


MEE à m о — === о 

E 11 -.03 -.22 ‚46 = 
А .34 -.10 17 

с -.24 28 

N 


Note: The values over |.35| are shown in boldface 


Discussion 


Hierarchical relations between traits and facets 


The Parsimax solution of Table 3 indicates that some adjectives are complex in 
terms of simple structure. For example, for the trait factor Extraversion, Merry (Kai- 
katsuna) loads simultaneously on the three facets E3 (Activity), E2 (Sociability), and 
E4 (Gregariousness). In comparison, the trait variable Extraverted (Gaikoutekina) 
loads only on the facet of E3 (Sociability). The trait adjective Extraverted may 
therefore be considered subordinate to Merry, in terms of a hierarchical relation, as 
both of them load on the facet E3 (Activity). The importance of hierarchical rela- 
tions between trait factors and facets from a factor analytic point of view, and the 
possibility to search for such relations, are emphasized for future studies of person- 
ality. 


Statistics for Study-2 


In Table 7 the statistics for the Big Five facets of the 105 Japanese Adjective List 
corresponding to the Parsimax solution are presented. Although the values of œQ- 
coefficients fluctuate from .86 to .62, it should be noted that their average 1s .78 as 
expected before starting this study. The lowest value .62 for the facet O3 (Quick- 
ness) is thought to be obtained because of its smallest number (four) of adjectives. 
The scores for the fourteen facets are nearly normally distributed though those for 
facet A1 are relatively sharper in terms of kurtosis. And, although the number (four) 
of the items for the facet O3 and the number (five) for facet A3 should possibly be 
increased, the fifteen facets obtained here are very suitable for the evaluations 
through their profiles together with the Big Five traits. 
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Table 7. Statistics for the Big Five facets on 105 Japanese adjectives (Parsimax) 


Facet number of items a's mean sd skewness kurtosis 
E1 9 .83 1.48 4.46 -.10 -,86 
ЧЕ? 6 ‚78 1.14 3.01 -.32 | -.60 
E3 E .85 1.58 3.53 -.27 -.58 
Е4 6 TA 117 2.62 -.40 -.26 
А1 8 .82 4.19 3.19 -.98 1217 
A2 6 77 223 2.22 -.10 -.84 
АЗ 5 .70 2512 42.24 -.77 17. 
C1 9 .80 -1.55 3.31 .31 -.72 
C2 6 ‚74 .52 2.89 -.10 -.75 
N1 7 .81 -1.86 3:57. .29 -.99 
N2 7 .84 -2.08 3:77 .34 -.95 
N3 y 272. -2.56 35127 .59 -.34 
01 8 .79 1.67 3.23 -.14 -.37 
02 10 .86 -1.22 4.61 .19 -.43 
03 4 .62 .58 1.98 -.22 -.45 


Concurrent validity: the Big Five versus psychoanalytic concepts 


As was mentioned in the introductory section, Kashiwagi (1999) discussed the con- 
current validity study, based on both an adjective type list (Adjective List or AL) 
and the sentence type list of the TEG (Tokyo University Egogram). The analysis 
was based on the items of both inventories. In the present chapter, this discussion is 
confined to the scales of both inventories. First, however, it is necessary to give 
some more details about the TEG and the AL. 

The TEG is a psychodiagnostic type of personality inventory widely used in Ja- 
pan, requiring self-ratings based on five kinds of psychoanalytic concepts. The five 
TEG scales each consist of ten items of the sentence type. The scale concepts are 
Nurturing Parent (NP), Critical Parent (CP), Adult (AD), Adapted Child (AC), and 
Free Child (FC). Each concept is represented by ten items such as “You love others 
as you do yourself" for NP, “Others say frequently that you are very strict to them as 
well as to yourself “ for CP, "Others say that you are very logical" for AD, “I am 
very nervous of the critics to myself" for AC, and “ Others say that you are cheerful" 
for FC. The subjects are requested to respond in terms of "Yes", “Questionable”, 
and "No". These concepts of TEG were confirmed through Varimax, and the stan- 
dardization for TEG was performed (For details about basic statistics, see Kashi- 
wagi, 1999). 

The AL scales are composed of fifty Japanese adjectives for the Big Five, which 
are selected from the results of Table 1 in a nearly random way, as the adjectives are 
very satisfactory in the sense of the Big Five. The items of the AL for the five scales 
are presented in Table 8. Also the AL has three answering options "Yes", “Ques- 
tionable", and "No". 

Self-Ratings were collected from 250 subjects (165 men and 85 women), on both 
the TEG items and the AL items, and these data were used to investigate the inter- 
relationships between the Big Five concepts and the psychoanalytic concepts. For 
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Table 8. Japanese Adjective List translated into English used for concurrent validation 


т ————————————— 
Extraversion (E) 


+ active, aggressive, effervescent, extraverted, talkative 
- aloof, sober, shy, unsociable, quiet 
Agreeableness (A) 
+ affectionate, charitable, gentle, generous, sympathetic, good-natured, honest not-self- 
centered 
- antagonistic, harsh 
Conscientiousness (C) 
+ industrious 
rash, illogical, fickle, weary, irresponsible, disorganized, negligent, optimistic, unpunctual 
Neuroticism (N) 
+ insecure, anxious, high-strung, restless, fearful, tense, nervous, unstable, vulnerable, up- 
set 
Openness to Experience(0) 
+ aesthetic, versatile, wise, proficient, perceptive, creative, independent, smart, flexible, 
accommodating 


this purpose, two joint factor analyses were carried out on the scales from both 
inventories followed by an application of the incomplete orthogonal procrustes fac- 
tor rotation of Browne (1972). 

The first analysis involved the application of Browne’s method, but restricted to 
the AL scales among the scales from both questionnaires. The second analysis in- 
volved the application of the same method, but restricted to the TEG scales. Pre- 
ceding the factor rotations, the signs of all the scores for the adjectives that were 
evaluated negatively in the AL were reversed. 

The target matrix for the first analysis is presented in the left upper part of Table 
9. The squared sum of the rotated factor loadings corresponding to the zero elements 
in the target matrix was minimized in the sense of least squares with the orthogonal 
constraints of the factors rotated in both analyses. The rotated result for the first 
analysis is presented in the upper right part of Table 9. This may suggest that the 
TEG scales for the psychoanalytic concepts can be understandably described cir- 
cumplexically in terms of the Big Five, as the AL scales for the Big Five are, factor 
analytically, mutually independent. In other words, the concept NP in the TEG can 
be described in everyday wordts by such terms as active and affectionate, the con- 
cept CP can be described by words such as harsh and high-strung, the concept AD 
can be described by such words as wise and proficient , the concept AC can be de- 
scribed by shy, fickle and nervous, and the concept FC can be described by active 
and effervescent. As expected, the mean of the a-coefficients for the AL scales of 
the adjective type (.81) is larger than the one for the TEG scales of sentence type 
(5) 

The AL scales were evaluated from the perspective of the TEG scales in the sec- 
ond analysis. The target matrix for the second analysis is presented in the left side of 
the middle part of Table 9. The rotated result for the second analysis is presented in 
the right side of that table. This may suggest that the AL scales can be described 
circumplexically in terms of the four dimensions of the psychoanalytic TEG scales 
except for NP or for FC. In other words, it may be said that the psychoanalytic con- 
cepts of both NP and FC are rather closely related to each other. 
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Table 9. Incomplete Orthogonal Procrustes Factor Solutions for AL and TEG scales 


(Constrained by AL Scales) 


Scales Target E A (6 N о a 
E ?0000 .87 .00 .04 .16 25 86 
A 02000 .00 .88 24 .07 .07 72 
С 00? 00 .04 23 .93 47 EET 82 
N 00070 „16 .07 .18 .87 14 82 
о 0000? 25 .07 42 14 .87 .82 
mean of a =.81 
NP ёрге 53 61 12 121 24 .78 
CP ipe T .04 -.75 407 -.56 .07 67 
АР demie .08 14 34 15 83 .73 
АС О. -.45 .20 -.35 -.62 126 .80 
FC 304929 .80 22 -.29 .19 31 .79 


mean of a =.75 
(Constrained by TEG Scales) 


Scales Target NP CP AD AC FC 
E Qum 57 МИ .08 -.49 .68 
А 0270207 .56 -.68 .21 -.14 -.05 
С Du uuu .48 -.22 .34 -.56 -.49 
N ПГ? -.23 -.53 .19 -.66 „20 
[9] ЛЕРГЕ? 11 .04 .81 -.20 .37 
NP 20000 .75 -.19 .20 .04 236 
CP 02000 -.15 .92 -.06 .09 -.07 
AD 00?00 17 -.06 .87 -.24 .11 
AC 000?0 .04 .10 -.25 .83 -.26 
ЕС 0000? :33 -.07 .12 -.26 .82 


The orthogonal rotation matrices through the initial principal axes solution to the 
rotated orthogonal ones for both analyses were obtained. In addition, the orthogonal 
rotational interrelations between the personality traits for the Big Five in the AL 
scales and the five psychoanalytic TEG scales were obtained. The orthogonal rota- 
tion matrix obtained for the second analysis was transposed. and it was post- 
multiplied by the orthogonal rotation matrix for the first analysis. The result is pre- 
sented in the correlation matrix of Table 10. 

The result explained above based on the scales for both the AL and the TEG is 
essentially similar to the one analyzed on the basis of the items for both the AL and 
the TEG (Kashiwagi, 1999), which was suggested in the first part of this section. 
Although the previous result based on the items is not shown here, the present 
analysis based on their scales indicate that the psychoanalytic concept AD in the 
ТЕС correspond almost perfectly to the trait О in the AL. And the other four psy- 
choanalytic concepts may be described circumplexically in terms of the remaining 
four Big Five personality factors. 
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Table 10. Correlations between AL scales and TEG scales 
———————— — ——a——— — „ы. 


AL scales 
Concepts E А C N (0) 
NP .50 .55 ‚45 -.49 03 
СР 2 -.75 .03 -.62 215 
AD -.20 .09 „23 .05 .95 
AC -.44 -.35 -.58 -.59 -.04 
ЕС .69 .07 -.64 -.18 -.28 


Final conclusion 


It is concluded that the Big Five dimensions of personality traits were confirmed in 
Japanese. Although the results presented in Table 2 suggested some oblique rela- 
tions among the Big Five factors through the incomplete oblique procrustes factor 
rotation, their almost perfect mutual orthogonalities were confirmed in the 
results based on at least eighty-eight percent (105 to 120) of the items. This was the 
case through the application of both the Varimax for the Big Five factors and the 
Parsimax for the related fifteen facets (see Table 5 and Table 6). 

It is emphasized that the sentence type for personality inventories is to be pre- 
ferred over the adjective type. The preferred format should be the short behavior- 
descriptive sentence so that subjects are able to respond more exactly and more eas- 
ily. Still, at the same time, profiles or results obtained may very often be read and 
interpreted in related Big Five adjective terms. When using the short behavior- 
descriptive sentence type for personality inventories, for their interpretation in terms 
of the Big Five, it may be of help to use Big Five adjectives as well. Therefore, it is 
important to continue using adjective lists like the one in this study in personality 
research. 
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BIG FIVE ASSOCIATED INSTRUMENTS 


Chapter 14 


The Hogan Personality Inventory 


Robert Hogan 
Joyce Hogan 


Introduction 


We began developing the Hogan Personality Inventory (HPI) in 1976 for two 
reasons. First, from 1964 to 1976, we worked with the California Psychological 
Inventory (CPI: Gough, 1975), and by 1976 we had a substantial set of archival data 
on hand. We had read Norman's (1963) research but regarded it as lacking practical 
significance. To test this view, we constructed CPI content scales based on 
Norman's taxonomy, and reanalyzed our archival data. We found that the CPI con- 
tent scales substantially outperformed the standard scales, and this persuaded us that 
future inventories of normal personality should be based on the Five-Factor Model 
(FFM: Wiggins. 1996). Second, although we had no desire to develop a personality 
inventory (it is too much work), the major inventories at the time were perhaps 30 
years old, and none of them were likely to be reconfigured in terms of the FFM. We 
began working on the HPI as a teaching exercise in a graduate course in psychome- 
trics and one thing led to another. We arrived at the current (1995) version of the 
inventory through a constant process of evaluation and revision over a period of 20 
years. 

We began work on the HPI while we were at Johns Hopkins University and we 
originally called it the Hopkins Personality Inventory (HPI). We moved to the Uni- 
versity of Tulsa in 1982; we had published papers using the term HPI, so we retai- 
ned it, with one small modification. The HPI differs from the other inventories de- 
scribed in this book in seven ways, and each of these differences is important. 

1. The HPI has a well-defined conceptual foundation — it is based on Socioana- 
lytic theory (Hogan, 1983; 1991; 1996). Socioanalytic theory combines elements of 
traditional interpersonal theory (Carson, 1969; Leary, 1957; Sullivan, 1953; 
Wiggins, 1979) with evolutionary theory, and is intended to explain individual diffe- 
rences in interpersonal competence and effectiveness. The theory is based on five 
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key assumptions: (1) personality must be understood in the context of evolutionary 
theory; (2) people evolved as group living and culture using animals; (3) the most 
important human motives facilitate group living and enhance individual survival; (4) 
people are compelled to interact, and interaction involves negotiating for acceptance 
and status; and (5) some people are more effective at these negotiations than others 
(Hogan, 1996; Hogan, Jones, & Cheek, 1985). 

The theory starts with two generalizations about social life: (1) people always live 
(work) in groups, and (2) groups are always structured in status hierarchies. This 
leads to two conclusions about human motivation. First, people need acceptance and 
approval from others, (and want to avoid rejection); this need is manifested in terms 
of behavior designed to allow one to "get along" with other members of the group. 
Second, people need status and the control of resources (and want to avoid losing 
them); this need translates into behavior designed to allow one to "get ahead" or 
achieve status within the group(s) where one lives. These generalizations make sen- 
se in Darwinian terms: historically, people who could not get along with others and 
who lacked status and power had reduced opportunities for reproductive success. 

Anthropologists (cf. Redfield, 1960, p. 345) tell us that all societies require their 
members to work on "getting a living and living together." "Getting a living" con- 
cerns completing crucial life tasks, and "living together" concerns maintaining group 
solidarity. Social psychologists (cf. McGrath, Arrow, & Berdahl, 2000) tell us that 
group living allows people: "(а) to complete group projects and (b) to fulfill member 
needs. A group's success in fulfilling these two functions affects the viability and 
integrity of the group as a system" (p. 98). Small group research (cf. Forsyth, 1990; 
Mann, 1959) tells us that people provide their groups with task inputs and socio- 
emotional inputs. Task inputs promote the achievement of group goals, whereas so- 
cio-emotional inputs promote group solidarity. Personality is the heart of these pro- 
cesses — because the core of personality binds people to the groups where they live 
and work. People differ in their strategies for getting along (living together) and get- 
ting ahead (getting a living). These strategies, and the deep needs to get along and 
get ahead which they serve, are the core of personality. 

Socioanalytic theory distinguishes between personality from the perspective of 
the actor and personality from the perspective of the observer. Personality from the 
actor's view is a person's identity, and it reflects his/her hopes, dreams, fears, aspi- 
rations, and career intentions. Identity is a person's self-view; self-views are unique 
and idiosyncratic, and they are difficult to study empirically. On the other hand, per- 
sonality from the observers' view is a person's reputation. Reputation is defined in 
terms of trait evaluations — friendly, helpful, talkative, competitive, calm, etc. Re- 
putation reflects, from an observer's perspective, an actor's characteristic ways of 
behaving in public. The Five-Factor Model (FFM) represents the structure of obser- 
vers' ratings based on 75 years of factor analytic research (cf. Goldberg, 1993; 
Thurstone, 1934) — i.e., the FFM is a taxonomy of reputation (cf. Digman, 1990; 
Saucier & Goldberg, 1996). The observer's view of an actor's personality is easy to 
study and can be assessed reliably using ratings. 

Identity is the person you believe you are; reputation is the person we believe you 
are. Your identity guides your behavior during social interaction; your reputation 1s 
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the result of how we evaluate your performance after social interaction. Although 
identity is hard to measure, reputation is easy to measure and it is inherently valid — 
because the best predictor of future behavior is past behavior (Hough & Oswald, 
2000), and reputation is based on past behavior. 

2. The HPI is based on a theory of item responses that is very different from the 
standard self-report view of item responses (cf. Hogan & Hogan, 1998). From our 
perspective, the psychological processes involved in responding to questionnaire 
items are formally identical to those that guide social interaction more generally. 
People use their item responses to tell others how they want to be regarded — e.g., 
as calm, ambitious, hardworking, flexible, or enthusiastic. That is, people respond to 
items in terms of their identity, their theory of what they stand for. As such, item 
responses are self-presentations, not self-reports. When a test is scored, item respon- 
ses are in essence interpreted by an anonymous observer behind the questionnaire — 
i.e., the scoring key (Hogan & Hogan, 1998, р. 39). Reputations аге the result of a 
person's self-presentations being evaluated by others, and profiles on well- 
developed personality inventories predict reputation. 

In summary, people do not respond to questionnaire items in terms of putatively 
veridical self-reports. Rather, they respond to questionnaire items in the same way 
that they respond to interview questions. They use their identities to guide their res- 
ponses, and their responses (self-presentations) are intended to tell the "interviewer" 
how they want to be seen. Our model of item responses follows directly from Socio- 
analytic theory; thus, we can explicitly account for our own data base. 

3. The HPI shares the measurement goals of the California Psychological Inven- 
tory (CPI; Gough, 1975). From the original publication of the CPI to the present, 
Gough has maintained that it is designed to predict two classes of phenomena: (a) 
indices of competence and effectiveness; and (b) how people will be described by 
persons who know them well. We adopted these measurement goals because they 
are pragmatic, they lead to real world pay-offs, and they explicitly focus on validity. 

We believe that most test authors do not take validity seriously, although they 
would obviously disagree with our characterization. Test authors in the factor analy- 
tic tradition define validity as the degree to which the factor structure of their in- 
ventories will replicate across samples. Others, relying on the rather ill-defined con- 
cept of "construct validity", evaluate the validity of their inventories in terms of cor- 
relations with other tests and scales. Still others rather preposterously define validity 
in terms of the degree to which clients agree with their assessment results (cf. Hogan 
& Hogan, 1998). 

Following Gough (1965), we define validity in terms of the number of empiri- 
cally justified inferences we can make about a person on the basis of his or her score 
on a scale or measure. The more valid inferences we can make, the more valid the 
measure. In our view, assessment has a job to do — the job is to predict real world 
outcomes. Peoples' success in life, and how others describe them, are crucial out- 
comes, and validity is the critical index of how well an assessment device is doing 
its job. 

4. The HPI further specifies what important real world outcomes it tries to pre- 
dict. As stated in the first section, we believe that there are two broad classes of out- 
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comes that are crucial in human affairs. The first concerns the degree to which a 
person is liked, admired, respected, and accepted in his/her local community. Alter- 
natively, it is sometimes useful to be able to identify persons who are loathed, fea- 
red, or shunned — and perhaps understand why they are seen in that way. The se- 
cond outcome concerns the degree of status, power, and/or control of resources a 
person has been able to attain, or is likely to attain, in his/her local community. This 
includes identifying life's potential winners and losers. This information can be used 
for a variety of strategic purposes including hiring, promoting, and developing 
people. | 

We can summarize the foregoing by saying that the HPI is designed to assess 1п- 
dividual differences in people's potential for getting along and getting ahead in the 
groups where they live and work. 

5. The HPI was developed almost exclusively using data from working adults. 
Working adults provided the data for the original, and for the subsequent, develop- 
mental psychometrics. Working adults provided the normative data for the HPI. 
Current norms are based on over 300,000 working adults, and the norms are broken 
out separately for most relevant occupational categories: gender; race/ethnicity; age; 
status; applicants vs. incumbents, etc. Finally, the HPI was validated exclusively 
using data from working adults. To date we have conducted over 300 local validity 
studies in which HPI scale scores for job applicants or incumbents were compared 
with a wide variety of performance data, including supervisors' ratings, absente- 
eism, commendations, dismissals, subordinate appraisals, training performance, 
promotions, etc. 

We have validity data for most of the common jobs in the U.S. economy, and we 
have studied jobs ranging from janitor to Chief Executive Officer, and from nanny 
to bomb disposal technician. In every study, the relevant HPI scales and/or subscales 
— those scales pertinent to the criteria in question — significantly predicted perfor- 
mance. The more reliable the performance criteria, the more strongly did the HPI 
forecast performance. 

6. Reflecting our view that assessment has a job to do, the HPI is designed to pre- 
dict significant outcomes — including how a person is described by others. At stake 
here is an important issue in the philosophy of science. Consider what has happened 
in psychological measurement since its beginning. Binet developed his test to fore- 
cast educational or training outcomes and he never mentioned the world “intelligen- 
ce." Under the influence of Spearman and subsequent writers, the field of cognitive 
assessment changed from forecasting significant outcomes to measuring intelligen- 
ce, a concept that has yet to be rigorously defined. And in the process of measuring 
intelligence, researchers tend to ignore validity. Similarly, the MMPI and the CPI 
were developed to predict significant outcomes. Under the influence of factor ana- 
lysts, the field of personality assessment went from predicting outcomes to measu- 
ring traits, hypothetical entities that have yet to be discovered. The field moved from 
predicting something useful to measuring something that may not exist, and along 
the way it redefined the concept of validity so that predicting outcomes was no lon- 
ger important for evaluating validity. 
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Trait theory assumes that traits are real, neuropsychic entities that literally exist, 
although they have yet to be discovered. Moreover, the strength of these traits con- 
trols item endorsements, so that a person's score on a personality measure reflects, 
in a point for point manner, the strength or intensity of the underlying trait. In this 
model, a person's aggressive behavior is explained in terms of a trait for aggressive- 
ness. This is, of course, completely circular, but more importantly, it shuts down any 
subsequent debate about the causes of aggression. For example, we would argue that 
aggressive behavior is a form of self-presentation, it is an effort to tell others that 
one is not to be trifled with, that one is, somehow, a dangerous person. In our view, 
aggressiveness is a function of a person's identity and social learning experience 
rather than (necessarily) a function of a person's traits. 

To say that behavior is explained by traits is to make a particular explanatory 
claim, and one that can be challenged by alternative explanatory accounts. In our 
view, purposive social behavior is best explained in terms of a person's intentions, 
not mythical neuropsychic entities. The intentions can then be further analyzed and 
broken down. No one believes more than we do in the importance of biology for 
providing ultimate explanations of human behavior. But at the every day level, in- 
tentions are more satisfying than traits as explanatory constructs. 

7. Socioanalytic theory characterizes social behavior in terms of the two broad 
themes of "getting along" and "getting ahead". It follows that a personality inven- 
tory based on socioanalytic theory should have two dimensions. In our view, the 
dimension of the FFM are all aspects of getting along and getting ahead (cf. Wiggins 
& Trapnell, 1996). However, the task of predicting occupational outcomes requires 
more narrow band predictors than two or even five dimensions, and the HPI is about 
predicting practical outcomes. The HPI started as a classroom exercise designed to 
evaluate the FFM. Taking each dimension of the FFM one at a time, we asked what 
kinds of things would a person say or do so as to make others describe him/her as 
(for example, in the case of Adjustment) well or poorly adjusted. Among other 
things, we concluded this would involve seeming anxious, depressed, stressed, inse- 
cure, poorly attached, and having a lot of physical complaints. We wrote items for 
each of these components, called the components a dimension, and then moved on 
to the next dimension. 

After writing items for the components of each dimension, we had over 420 
items. The items were grouped into Homogenous Item Composites (HIC; Zonder- 
man, 1980). Each composite was a coherent set of 3 to 5 items reflecting a single 
theme — e.g., anxiety, depression, stress, insecurity, etc. Our intent from the outset 
was to conduct validity research at the HIC level; we realized, however, that most 
test users are accustomed to interpreting profiles based on scales, so we set about 
composing scales. We began in a rational way, but once we had enough data, we 
carefully evaluated the factor structure of the HICs. We finally decided upon a 
seven-factor solution as optimal, and at no point was a five-factor solution ever 
contemplated— none of them fit the data. We believe that people think about them- 
selves in a more complicated manner than they think about others. Consequently, 
although a five factor solution may be appropriate for rating other people, the factor 
structure of self-descriptions is necessarily more complex. 
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Thus, although we started with the FFM, we found a five factor solution did not 
fit our data. We think the FFM is the indispensable starting point for inventory con- 
struction, but it is not a structure than naturally inheres in nature. 


Technical Features of the Hogan Personality Inventory 


Guided by socioanalytic theory, the HPI is designed to predict individual differences 
in getting along and getting ahead in occupational settings. Although the inventory 
has broader uses (Axford, 1996), our research has been almost exclusively with job 
seekers and employed adults. Applications of the HPI include personnel selection, 
individualized assessment, placement, promotion, training, coaching and develop- 
ment, employee orientation, and career planning. 

Our goals for the HPI also included certain desired operational features. We 
wanted an item pool that could be completed quickly. We wrote items to be easily 
understood by a wide range of people, trying to balance reading level with item 
complexity. We considered how test takers would react to the items, and tried to 
avoid items that were potentially invasive or intrusive. We wanted to assess the 
"bright side" of personality, i.e., to focus on characteristics that facilitate or inhibit a 
person's ability to get along with others and achieve his/her occupational goals, and 
we wanted to avoid the domain of psychopathology. We were committed to develo- 
ping a measure that, when used for selection purposes, would have no adverse im- 
pact. Most importantly, we wanted to identify HICs that predicted significant occu- 
pational outcomes, including job performance. Finally, we wanted to develop an 
instrument that, when used appropriately, would yield financial payoffs for the user 
— reduced turnover, increased productivity, reduced shrinkage, increased retention, 
etc. 


Description of the 1995 HPI 


The Hogan Personality Inventory (Hogan & Hogan, 1995) is a 206-item true-false 
measure of normal personality designed to predict performance in real world set- 
tings. The inventory contains seven primary scales constructed from 41 HICs, which 
are groups of items that form sub-themes of the broader scale. The number of HICs 
per scale ranges from four (School Success) to eight (Adjustment). There is no item 
overlap among the primary scales. The inventory also contains six occupational 
scales and a validity scale to detect careless responding. The Flesch-Kincaid reading 
level analysis indicates that the items are written at the fourth grade level. This dis- 
cussion is limited to the HPI primary scales; discussion of the occupational scales is 
presented in the HPI manual (Hogan & Hogan, 1995) and other publications (Hogan 
& Hogan, 1989; Hogan, Hogan, & Busch, 1984). 

Table 1 presents the seven HPI primary scales and their definitions. Three of 
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Table 1. Hogan Personality Inventory Primary Scales and Definitions 
———— ыш >к. ee 


Definition 

Adjustment The degree to which a person appears calm and self-accepting. 

Ambition The degree to which a person seems socially self-confident, leader-like, com- 
petitive, and energetic. 

Sociability The degree to which a person seems to need and/or enjoy interacting with 
others. 

Likeability The degree to which a person is seen as perceptive, tactful, and socially sen- 
sitive. 

Prudence The degree to which a person seems conscientious, conforming, and dependa- 
ble. 

Intellectance The degree to which a person is perceived as bright, creative, and interested 


in intellectual matters. 


School Success The degree to which a person seems to enjoy academic activities and to value 
educational achievement for its own sake. 


these scales are directly aligned with FFM dimensions: Emotional Stability, 
Agreeableness, and Conscientiousness are represented by HPI Adjustment, Likeabi- 
lity, and Prudence. However, the HPI Ambition and Sociability scales cover the do- 
main represented by the FFM Surgency dimension, and the HPI Intellectance and 
School Success scales cover the domain represented by FFM Intellect/Openness to 
Experience. The decision to include seven scales reflects the factor structure of the 
HICs. We intercorrelated scores on the HICs using a sample of 2500 employed 
adults. We decided on seven factors based on eigenvalues, a scree test, and the com- 
prehensibility of alternative solutions. We refined the components using orthogonal 
varimax rotation. The factor matrix for the HPI HICs, as they load on the seven 
factors, appears in the test manual (Hogan & Hogan, 1995, p. 11). 

We evaluated the internal consistency of the primary scales using Cronbach’s 
alpha and a sample of 960 adults. We evaluated test-retest reliability with a sample 
of 150 respondents over an interval greater than four weeks. The internal consisten- 
cy reliability and the test-retest reliability, respectively, for each scale is: Adjustment 
(.89/.86); Ambition (.86/.83); Sociability (.83/.79); Likeability (.71/.80); Prudence 
(.78/.74); Intellectance (.78/.83); and School Success (.75/.86). In Buros Mental 
Measurement Yearbook reviews, Axford (1996) and LoBello (1996) deemed the 
magnitude of these reliability coefficients adequate for research and application. 

Table 2 presents the intercorrelations among the HPI primary scales based on a 
sample of 30,016 employed adults. As seen, Adjustment is correlated with all the 
other scales except Sociability. Ambition is moderately correlated with all the other 
scales. Sociability is positively related to Intellectance and negatively related to Pru- 
dence. In addition, Likeability is associated with Prudence and Intellectance is asso- 
ciated with School Success. Adjustment, Ambition, Prudence, and Likeability form 
one cluster of scales. Sociability is a second, and Intellectance and School Success 
form a weak third cluster. Other than these correlations the scales are reasonably 
independent. 
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Table 2. Hogan Personality Inventory primary scale intercorrelation matrix 
es ee NE. — __________ ————___———————_ 


Adjustment 
Ambition 
Sociability 
Likeability 
Prudence 
Intellectance 


RM E. LL Жы f ce T ш 
Adjustment = 


Ambition 25055 — 

Sociability -.10°* на sie — 

Likeability 46" .34** e pi — 

Prudence у 2240" -.31** 38° — 

intellectance qp ta 23722 .45** бх -.05** — 
School Success 22625 BTA .14** 91055 7215“ 388 


Note: N = 30,016; ** p< .01, two-tailed. 


Figure 1 shows raw score means for each of the HPI primary scales by gender 
and by race/ethnicity. Scale means and standard deviations as well as norms appear 
in the technical manual by gender, age, and race (Hogan & Hogan, 1995). The stri- 
king feature of Figure 1, which is corroborated by the descriptive statistics, is that 
there are no practical differences in scale scores’ means and standard deviations by 
race/ethnicity. However, for gender, females score lower than males on Adjustment, 
Ambition, and Intellectance. 


Validity of the НР! 


Understanding the meaning of a psychological measure is an ongoing task. Meaning 
is defined through a cumulative process of test validation. There are a number of 
ways to clarify meaning, although validity has historically been defined in terms of 
correlations between test scores and criterion variables. Considerable attention is 
typically paid to the correlations, but little consideration is typically given to criteri- 
on adequacy. In our view, the concept of validity applies to both predictor and crite- 
rion measures — we need to validate our scales and our criteria. The process of try- 
ing to validate both sets of measures can lead to an infinite regress; nonetheless, we 
can't evaluate the validity of a personality scale based on the correlation between 
scale scores and any single criterion measure. We understand the meaning of a test 
score by: (1) placing the construct in a theoretical context; (2) understanding the 
latent structure underlying both test scores and criterion measures (Campbell, 1990; 
Hogan & Nicholson, 1988); (3) examining what the scale predicts (convergent vali- 
dity) and doesn't predict (divergent validity); and (4) gathering subsequent data to 
test further predictions. Theory provides an explanation for the covariations obtai- 
ned. 

Over the last 20 years, we have used four types of evidence to explore the con- 
struct validity of the HPI scales: (1) correlations with scales of other well-validated 
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Figure 1. Hogan Personality Inventory Raw Scores by Gender and Race 
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Table 3. Links between the Hogan Personality Inventory and other Big Five measures 


Source/ Hogan Personality Inventory Scales 
Big-Five Measure Adj Amb Soc Lik Pru Int Sch 


Goldberg (1992)/ 
Big-Five Factor Markers (N - 168) 


Surgency .04 .59** „44“ S -.24“ 22915 -.03 
Agreeableness .13 -.11 .02 15655 13°” -.12 -.17 
Conscientiousness .10 .24“ -.26“ -.07 .36 -.17 -.08 
Emotional Stability One 3925 -.04 22755 .01 2858 11 
Intellect .05 :22** -.04 -.01 .03 337 „Зо 


Salgado and Moscoso (1999)/ 
Inventario de Personalidad de Cinco Factores (N - 200) 


IP-Neuroticism .66** 190% .16 SERE 21245 26% -- 
IP-Extraversion .24** .60** 16255 2355 .04 ios -- 
IP-Openness sth) 44“ 25102 225" -.15 .69** -- 
IP-Agreeableness "2245 -.12 -.10 3375 02509 -.10 = 
IP-Conscientiousness 12242 23555 .08 2305. .49** .19 -- 


Goldberg (2000)/ 
NEO PI-R as part of IPIP project (М = 679) 


Neuroticism 2725 2535 .08 227 72252 14“ .16** 
Extraversion 1615 .54** 16385 .44** -.06 24" .08 
Openness .01 72055 .38“ 7192 223125 Boy ae .24** 
Agreeableness 13050 -.12** -.24“ .47** .46** 70 -.08 
Conscientiousness .24** Зу -.05 .07 .42** .05 .16** 


Mount and Barrick (1995)/ 
Personal Characteristics Inventory (N = 154) 


Extraversion 23055 .64** .26** -.09 .04 .18 - 
Agreeableness 22555 .09 9018 22417" 2505 -.03 -- 
Conscientiousness .39** -.06 17. RES .24“ ‚08 -- 
Stability 15025 -.02 .46** ong .69** .06 -- 
Openness 236 aE 17 -.05 12 n E -- 


Note: ** p < .01, one-tailed 


tests; (2) correlations with respondent's ratings; (3) correlations with organizational 
criteria, and (4) meta-analyses of scale correlates with job performance criteria. We 
describe these next. 


Correlations with other tests 


We obtained HPI matched data sets for four categories of measures to determine the 
convergent and divergent validity of the HPI scales. Correlations should be highest 
between those measures purporting to assess the same construct and lowest between 
those measures assessing different constructs. Categories of tests included normal 
personality, dysfunctional personality, motives and interests, and cognitive ability. 
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име 4. Hogan Personality Inventory correlates with the Minnesota Multiphasic Personality Inven- 
ory- 


———————— U 


" a 

Ф 

Е > E e о 

E E = = У 5 A 

Е 5 = = У 5 a 

a = а B Ф © 

2 a ES 5 © Ж © 

5 Е 8 x d = 5 

MMPI-2 < < c a Е a 

E [On -.20* 325 45" 512 .09 
F 2:215 -.13 18 -.12 -.12 02 -.23* 

К 59** 45** -.07 А155 Бе -.01 14 

Hs -.66** -.31** .04 -.18 -.19 14 -.13 

р -.48** -.56** z:33** -.22* .07 24752 -.05 

Hy = 305 .00 -.04 .03 215) -.07 .07 
Ра -.64"* -.33** .03 -.40** -.43** -.11 -.36** 
Mf :.3155 -.27* -.05 .05 18 -.29** 2425 

Pa -.47** -.28'* -.06 -.19 -.14 -.09 .05 
Pt -.76** -.63** -.02 -.46** 200055 -,24* -.26* 
Sc -.72** 597165 -.05 -.50** -.51“ -.19 -.26* 
Ma -.41** .05 ДО" -.15 507 23025 -.17 

5 -.42** А -,49** -.48** 21 азе" -.06 


Note: N = 71; * р < .05; ** р < .01, one-tailed. 


Table 3 presents correlations between the HPI scales and other well-constructed 
measures of FFM-based personality inventories. Median correlation coefficients 
summarize HPI relations with the NEO PI-R (Costa & McCrae, 1992; Goldberg, 
2000), Goldberg's (1992) Big-Five Markers (R. Hogan & Hogan, 1995), Personal 
Characteristics Inventory (Mount & Barrick, 1995b), and the Inventario de Persona- 
lidad de Cinco Factores (Salgado, 1998; Salgado & Moscoso, 1999). Descriptions of 
procedures, methods, and samples are contained in the technical documents referen- 
ced above. The medians and ranges of correlations are as follows: Adjust- 
ment/Emotional Stability/Neuroticism (median r = .73; range = .66 to .81); Ambiti- 
on/Extraversion/Surgency (median r = .56; range = .39 to .60); Sociability/ Extra- 
version/Surgency (median r = 62; range = .44 to .64); Likeability/Agreeableness 
(median г = .50; range = .22 to .61); Prudence/Conscientiousness (median r = .51; 
range = .36 to .59); Intellectance/Openness/Intellect (median r = .57; range = .33 to 
.69); and School Success/Openness/Intellect (median r = .30; range = .05 to .35). 
Although the off-quadrant correlations are not presented in the table, these are lower 
than correlations between scales sharing the same underlying construct. These data 
suggest that findings based on the HPI will generalize to other well-constructed 
measures of the FFM. 

Tables 4 and 5 present correlations between the HPI scales and scales of the Min- 
nesota Multiphasic Personality Inventory-2 (MMPI-2; Hathaway & McKinley, 
1943; Butcher Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) and the Hogan 
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Table 5. Hogan Personality Inventory and Hogan Development Survey primary scale correlates 
______.,. ___ OL MM. Eum o or eH i LL c 


g { 
7 с = = 9 5 3 
E 5 = 5 Е р 5 
E. = R г 5 = о 
g Ё F > Е E 5 
HDS = б = a = 
Excitable -.69** n47* -.10** -.39'* -.31** -.10** 156 
Skeptical -.38** -.20** ‚02 -.28** -.28** -.02 13° 
Cautious -,50** -,66** -.30** +32 -.08** -.21**  -.20** 
Reserved -.28** = ЗА" -.35** -.57** -.17** -.45" 0823 
Leisurely --28" -.28** -.06* 2920055 -.14** .03 -.06* 
Bold -.01 22552 Que" .02 -.09** 185 12 
Mischievous .03 22:355 .45** .04 Зб 38 .08** 
Colorful .08** „40“ .60** 485 -.19** ‚23 31925 
Imaginative -.18** .06* rs -.02 2.3955 .29** 1095 
Diligent == 131“ -.05* -,14** .00 :33^^ -.01 -.07* 
Dutiful -.06* 02055 „12° 167 206 -.08** -.17** 


Note: № = 2,692; * р < .05; ** р < .01, one-tailed. 


Development Survey (HDS; Hogan & Hogan, 1997), respectively. The MMPI was 
developed as an aid to psychiatric diagnosis and the original and revised versions are 
the most widely used tests of psychopathology in the world today. The HPI correla- 
tions in Table 4 are presented for the MMPI-2 basic validity keys and 10 clinical 
scales based on a sample of male and female police officer applicants (М = 71). As 
expected, there are substantial negative correlations between HPI Adjustment and all 
of the standard scales of the MMPI-2. The MMPI-2 scales most saturated with neu- 
roticism — Hypochondriasis (Hs) and Psychasthenia (Pt) — yield the largest corre- 
lations with HPI Adjustment. Although HPI Adjustment is the scale most highly 
saturated with pathology, other correlations are also consistent with FFM predicti- 
ons. Notable are negative correlations between HPI Ambition/Sociability and MMPI 
Social Introversion, HPI Likeability and MMPI Schizophrenia, and HPI Prudence 
and MMPI Psychopathic Deviate. 

The HPI-HDS correlations in Table 5 were obtained from male and female job 
incumbents (N = 2,692) who completed the inventories during job coaching. The 
HDS is designed to assess eleven common dysfunctional dispositions derived, in 
part, from the DSM-IV, Axis 2 personality disorders taxonomy (American Psychia- 
tric Association, 1994). Principal components analysis of the HDS yields three the- 
mes components that correspond to Horney's (1950) concepts of "moving away 
from people," “moving against people," and "moving toward people" (Hogan & 
Hogan, 1997, p. 10-12). The distinguishing feature of Table 5 is the negative mani- 
fold of relations for the first HDS component — Excitable, Skeptical, Cautious, Re- 
served, and Leisurely — with all HPI scales. This suggests that low HPI Adjustment 
scores reflect a syndrome that includes unstable relationships, suspiciousness, social 
anhedonia, and sensitivity to criticism. This should lead to poor interpersonal beha- 
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Table 6. Hogan Personality Inventory and Motives, Values, Preferences Inventory primary scale 
correlates 


-————————— ——________________________________ 


" р, 

2 У У 

з 5 E: 5 © i 8 

3 E 3 x = = = 
МУР! ч < = a = A 
Aesthetic -.20** 05%" 2305 .01 rise 39** 17** 
Affiliation 28 36“ 41” .38“ 15" EL 08“ 
Altruistic .10** .08** .03 .26"* 23" 12" .04* 
Commercial sj is fee 225^ 217 1325 asp 19° "OR 
Hedonistic -.34** -.16** 23345 -.02 -.39** 10725 -.08** 
Power .05** Tage 367 .06** .01 :26^* .20** 
Recognition -.18** 21125 Sie ‚01 -.19** 22:3** .06** 
Scientific -.09** -.04** 1055 -.02 -.04* 2275: Sunt 
Security .02 -.09** -.28** .03 3425 -.20'* exige 
Tradition 22075 2925 -.10** 1196 345 .04* .08“ 


Note: № = 2,692; * р < .05; ** p< .01, one-tailed. 


vior that inhibits social relations and a person’s ability to get along with others and 
achieve occupational goals. In addition, note the correlations between HPI Ambition 
and Sociability with HDS Bold (Narcissistic) and Colorful (Histronic) scales as well 
as the relations between HPI Prudence and Diligent (Obsessive Compulsive) and 
Dutiful (Dependent). These relations suggest shortcomings associated with high 
scores on these scales. 

Considering the relation between the HPI and assessments of interests, motives, 
and values, the correlations for Holland’s (1985) Self-Directed Search appear in 
Hogan and Hogan (1995, p. 23). These results are sensible; the most interesting pat- 
terns occurred for Adjustment with no significant relations with any interests and for 
Investigative with significant correlations for all interests except SDS Conventional. 
Table 6 presents correlations between the HPI scales and scales of the Motives, Va- 
lues, and Preferences Inventory (MVPI; Hogan & Hogan, 1996). The MVPI is a 
direct assessment of a person’s motives and the assumption underlying the asses- 
sment is that values and interests are motivational concepts. The ten MVPI scales 
represent those dimensions that have historic presence in the literature on motivati- 
on. As such, the MVPI is based on a comprehensive taxonomy of motivational con- 
structs. The HPI-MVPI correlations are based on a sample (N = 2,692) of male and 
female job applicants and incumbents. We hypothesized that there would be signifi- 
cant relations with every HPI scale, although some scales would be related to multi- 
ple motives and some relations would be negative. 

As seen in Table 6, HPI Adjustment is positively related to Affiliation motives 
and negatively related to Aesthetic motives. This pattern is consistent with the crea- 
tivity literature. HPI Ambition is most highly correlated with Power motives, while 
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HPI Sociability is most highly correlated with Recognition motives. HPI Likeability 
is correlated with both Affiliation and Altrustic motives. HPI Prudence is negatively 
related to Hedonistic motives and positively related to Security motives. Finally, 
HPI Intellectance has its strongest relation with Aesthetic motives and HPI School 
Success has its strongest relation with Power motives. These data provide useful 
interpretive information for HPI scale scores in the area of predicting what is likely 
to motivate an individual and what occupational environment is likely to be a good 
fit. 

Tests of cognitive ability tend to be unrelated to the HPI scales with the exception 
of the Intellectance and School Success scales. We expect a modest positive correla- 
tion with Intellectance, because it contains a component of intellectual curiosity, and 
with School Success because it concerns interest in education and training. Hogan 
and Hogan (1995, p. 22) show that the only meaningful correlations between the 
HPI (r's ~ .20) and the Armed Services Vocational Aptitude Battery (ASVAB; U.S. 
Department of Defense, 1984) are with Intellectance and School Success. These 
results are corroborated by correlations between the HPI and several PSI Basic 
Skills Tests, the Industrial Reading Test, and the Watson-Glaser Critical Thinking 
Appraisal. 


HPI correlations with others’ descriptions 


Correlations between others' descriptions and HPI scale scores allow us to evaluate 
the validity of the HPI and the adequacy of socioanalytic theory—on which the HPI 
rests. In a practical sense, the links between scale scores and reputational descripti- 
ons provide the information for individual feedback and coaching. For example, 
these correlations allow us to say, with some degree of accuracy, that people with 
high scores for Ambition are likely to be described by others as outgoing, assertive, 
polished, and forceful. 

For this analysis we asked undergraduate and graduate student volunteers (№ = 
128) to complete the HPI and to distribute rating forms to two people who had 
known them for at least two years. The rating form contained 112 adjectives from 
Gough and Heilbrun's (1983) Adjective Check List (ACL); these adjectives were 
identified by John (1990) as prototypical markers of the FFM dimensions. Respon- 
dents rated the target person using a 5-point Likert scale, where "1" indicated 
"strongly disagree" and “5” indicated "strongly agree." 

Correlations between the ACL items and the HPI scales appear in Table 7. The 
table lists the ten adjectives most highly correlated with each scale. As seen, these 
adjectives correspond closely to the scale definitions listed in Table 1 as well as the 
FFM dimension definitions that appear in John (1990). Close correspondence bet- 
ween descriptors and definitions suggest that the HPI scales are assessing the re- 
putational features they were intended to assess. The negative correlations in the 
table are particularly interesting because they help interpret low scores and expand 
the understanding of the scale ranges. 
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Table 7. Hogan Personality Inventory and adjectival correlates 


Adjustment r Ambition R Sociability r 
Тепѕе -.53** Outgoing 31% Quiet -.45"* 
Worrying -,49** Shy zen ss Talkative .48** 
Moody -.46"* Retiring -.30** Shy -42" 
Unstable . -.43** Assertive 178° Outgoing 37“ 
Self Pitying -.39** Spunky 786° Silent -.37** 
Temperamental - 39" Polished 2955 Reserved cos bee 
Nervous . Eng jy Silent -.27** Show-off 337 
Fearful 2537s Active 2656 Spunky 2325» 
Self Punishing -.36'* Sociable 22665 Outspoken 23255 
High Strung 2:355 Forceful .24** Withdrawn 2325 
Likeability r Prudence R Intellectance r 
sympathetic .44** Noisy Agee Narrow Interests -.42** 
Praising .44** Through 38“ Ingenious .34“ 
Outgoing .43** Wise 2375s Artistic sie 
Soft-hearted oe fa Precise 037% Imaginative 230 
Enthusiastic 13755 Irresponsible -.36** Inventive 23022 
Sociable 23725 Stable .30** Sharp-witted 23055 
Friendly бїз Show-off -.34“ Active 22928 
Polished 33352. Cautious .30** Energetic .26** 
Sensitive 33°" Efficient Dc bis Witty .26** 
Pleasant 2915. Practical Sh Original До 
School Success r 

Narrow Interests OV 1 

Insightful .24“ 

Ingenious 223** 

Foresighted 2227 

Clever P245 

Good Natured -.22** 

Thorough 7192 

Precise .18* 

Touchy -.17* 

Painstaking .16* 


Note: № = 168; * р < .05; ** р < .01, one-tailed. 


HPI correlations with organizational criteria 


The HPI is designed to forecast performance in real world settings; this has been a 
major focus of HPI validation research to date. The HPI has been used in hundreds 
of personnel selection studies to predict job performance. Using local validation 
research, we analyze the target job, develop hypotheses about the HPI components 
that should predict job performance, and develop criterion measures based on the job 
requirements. Next, we test applicants/incumbents and collect criterion data. Typi- 
cally, these data are supervisors’ ratings of the incumbents’ job performance and 
objective performance measures. We compute various statistics, including correlati- 
ons, to determine which HPI components are related to the criteria. Finally, we make 
recommendations for test implementation and follow-up. 

Building a data base one study at a time, we now understand the organizational 
criteria best predicted by the HPI scales. Table 8 shows an example of the empincal 
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links between the HPI scales and important organizational performance outcomes. 
Note that there is no listing for criterion variables of overall or summary job perfor- 
mance. Our strategy for validation research is to align predictors with criteria using 
the underlying construct. Historically, researchers pay considerable attention to mea- 
surement properties of predictors and virtually no attention to the adequacy of crite- 
rion data, often gathering that which is convenient rather than that which is valid. 
Campbell (1990) emphasized that the latent structure of constructs should extend 
across both predictor and criterion space and we organize our measures in this way. 
Therefore, the example criterion measures listed in Table 8 are criteria appropriate 
for their respective FFM dimensions. Matching test and criterion measures on the 
basis of a common construct is the key to maximizing the prediction of job perfor- 
mance. 

The striking feature of Table 8 is that, across a range of jobs, organizations, and 
criteria, the HPI is consistently related to job performance. Although some writers 
believe conscientiousness as the only important FFM dimension (Ones, Viswesva- 
ran, & Schmidt, 1993; Schmidt & Hunter, 1998), it is clear that when criteria are 
saturated with other personality-related content, the remaining FFM dimensions will 
also predict performance. When these results are compared to the mean validities 
reported by Barrick and Mount (1991), the virtues of aligning predictors with speci- 
fic criteria become clear. 


Meta-analyses of НР! Scales 


We conducted a series of meta-analyses to answer the question of how well indivi- 
dual FFM dimensions predict criteria when they are aligned with the underlying 
construct (Hogan & Holland, 2001). We identified 43 independent samples (total N 
= 5,242) that met the following criteria: (1) the study used job analysis to estimate 
personality-based job requirements; (2) the study used a concurrent or predictive 
validation strategy with working adults; (3) the criteria were content explicit, not 
overall job performance; (4) the predictor variables were scales of the HPI; and (5) 
the sample was larger than N = 25. We excluded studies of the following type: (1) 
studies using clinical patients and therapists; (2) studies using undergraduate or gra- 
duate students; (3) studies using self-reported performance criteria; (3) studies using 
performance criteria other than ratings and objective productivity/personnel measu- 
res; (4) studies in which the only criterion was overall performance criteria; (5) labo- 
ratory or assessment center studies; (6) studies unrelated to work contexts; and (7) 
dissertation research. 

Subject matter experts (SMEs; М = 13) reviewed the criterion variables in each 
study and identified the personality construct most closely associated with each per- 
formance criterion. The seven HPI scale constructs were defined and SMEs were 
asked to nominate only one scale for each criterion listed. Definitions of each per- 
formance criterion came from the original validation studies. The result was a nomi- 
nal construct rating for each criterion, which allowed us to align the criteria with the 
predictors based on their common meaning (Campbell, 1990). We calculated Kappa 
to evaluate interrater agreement on nominal rating scales. For the seven personality 


345 


"uoisnj2ut јој епәзиэ рәѕед-/114е1205 ЧИМ затрпао цёпоиә JOU әләм әләці әѕпеоәд әде 293 UL PayUasaidas 100 51 Ajrjiqei2os :210М 


eee 


Р Sutures} uo sazijejide? SjioóeueW 6p (0002) чевон 3 'pueyog ‘иоз}әцс 
pU SuiuieJ| ULSS2J8Olg зәўгүү — SaauteJj 19euiSua задошо2ој gri (0002) 1edduy 3 ‘uesoy ‘uesoy 
IE NAS тезе 1 531q1ux3 sdaj әз!л1ә$ Jeuojsn) ZHL (4661) чевон B '1aKau»uug ортака 
55ә20П joou»s 
8} AAARS 39XJeW 5шәәс siaseuew 68 (9661) Апочмиођ 
WU suoljesadQ/saaueulj sezAjeuy SJ9X10^ LDU) £p (6661) чевон 3 рџиеђон 
= ЛА uorjeul0JU| ум Aen saAelu»y 5192130 ]euO)22410) 04 (2661) 421GAy B џевон 
= a2ue323|a1u| 
M 9|" $4920jO0 Jeuorj22a40) OZ (2661) bppiqAy B чевон 
= 9t syjuays Ajndag £9 (4661) чевон B bptqAy 
2 Les злаџом Buunj»ejnuew 40L (4661) чевон 3 јадошуишд 
5 аоџерпја 
5 ele əsıwosdwo) ој A3i»ede» siiqiux3 SdoJ зоо JQWO SN} 7GZ (6661) SWayshs quauissassy иРВОН 
S 6l уред seJeus sJo8eueW 68 (9661) Ajjouuo5 
а. 97' 11945 Jeuossadsajzuy 5МОЦС $әл!ўезиә$әлйәл Paty 875 (4661) чевон 3 заћошмиша 
S Ayiqeay!y 
S ÞU ƏANLLLUJ sexe | 5ловеџеш ojeujedx3 077 (4661) 1ej12]euia) B '59џ0 "jrSueuig 
= cc sjuno25y A)yUOW мәм 5әзеләнәс) sjuejjnsuo2 eueut ZZ (6661) P1041239 9 uesop 
E РЕ' іцѕзәреәг sitqiux3 SjoBeueW 96 (0007) чевон 3 ‘шц ‘роиецоң 
uoniquiy 
(2777 559135 B “515117 'ajdoad soSeuew sjo8euew 96 (0002) чевон з ‘шц ‘риеђон 
Gs Аоџе ом SMOUS sdaj 92195 зошојп5у ZZ (9661) чевон э 1М2!дАМ 
1455 рәЈәйшә | излаз ѕшешәҹ SJo]pueu soyduy ZZ (4661) чозәц5 B 'Влофјоргон "Ура ‘uesoy 
3ueunsn(py 
ajduieg 32Jnos aJe»s IdH 


1 219313 
SE _______________ 934n0$ әјеэѕ 4н 


5ууп5ел uoneptjeA ajdwes Ai03uaAu| Ajeuosjag џевон '8 ayqey 


346 Big Five Assessment 


constructs, the Kappa value was .48, which is within the .40 to .60 range considered 
as moderate to good interrater agreement. 

We used the meta-analytic procedures specified by Hunter and Schmidt (1990) to 
cumulate results across studies and to assess effect sizes. All studies used zero-order 
product-moment correlations; this eliminated the need to convert alternative statis- 
tics to values of r. Corrections were made for sampling error and unreliability in the 
measures. Reliability of the personality measures was estimated using within-study 
coefficient alpha, rather than using the values reported in the HPI manual. No cor- 
rections were made for range restriction in the predictors or the criteria. We used 
artifact distributions to correct for unreliability in the criterion measures because we 
did not have sufficient information to correct each study individually. Following 
Barrick and Mount (1991) and Tett, Jackson, and Rothstein (1991), we used the .508 
reliability coefficient proposed by Rothstein (1990) as the estimate of the reliability 
of supervisory ratings of job performance. For objective criterion data, we (conser- 
vatively) assumed perfect reliability, following Salgado (1997). The frequency- 
weighted mean of the job performance reliability distribution was .59, which is 
comparable to the value of .56 reported by Barrick and Mount (1991), and the mean 
square root reliability of .76 corresponds to the value of .778 reported by Tett er al. 
(1991). We did not correct correlation coefficients to estimate validity at the con- 
struct level. 

Hunter and Schmidt (1990) point out that meta-analytic results can be biased un- 
less each sample contributes about the same number of correlations to the total. To 
eliminate this problem, correlations within studies were averaged so that each sam- 
ple contributed only one point estimate per predictor scale. This procedure takes into 
account both negative and positive correlations and avoids assumptions about using 
mean absolute values for averaging correlations. This is the major computational 
difference between these analyses and those presented by Tett er al. (1991), who 
used mean absolute value correlations for within-study averaging. 

Table 9 presents validity results for HPI scales aligned by construct-classified 
criteria. Forty-two meta-analyses were computed to evaluate convergent and diver- 
gent validity of construct-aligned measures. There were too few studies with criteria 
categorized as sociability-related to compute meta-analyses for the HPI Sociability 
scale. However, there were sufficient studies to compute meaningful analyses for all 
other scales. The estimated true validities range from .25 (HPI School Success) to 
.43 (HPI Adjustment). The lower bound confidence intervals are all greater than .10, 
which suggests that scale validity generalizes across samples when criteria are clas- 
sified by construct. In every case, the confidence intervals support the reliability of 
the validity coefficients. 

Although not included in this table, we examined the convergent and discriminate 
validity of the FFM measures. For each dimension, correlations are highest between 
personality scales and the aligned, construct-specific criterion variables and this in- 
dicates convergence. The validity coefficient for HPI Adjustment (.43) is the largest 
in the table. Similarly, validity coefficients are smallest for the personality scales 
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Table 9. Hogan Personality Inventory Meta-Analysis Results for Criteria Aligned by Construct 


1 2 3 4 5 6 7 8 9 

K total N avg N robs SDobs rho  Sdrho УЕ 90% СУ 
Adjustment 24 2,573 107 229 .114 ‚43 7 62 .28 
Ambition 28 3,698 132 .20 .077 35 .000 119 235 
Sociability na na na na na na na na na 
Likeability 17 2,500 147 .18 .094 .34 .100 68 _ 221] 
Prudence 26 3,379 130 227 .113 .36 „125 55 ‚20 
Intellectance 7 1,190 170 .20 ‚037 .34 .000 357 .34 
School Success 9 1,366 152 215 2132 225 .184 34 .01 


Note: K - number of studies; total М = number of participants across k studies; average М = average 
number of participants within each study; г obs = mean observed validity; SD obs = SD of observed 
correlations; rho = true validity at scale level; SD rho = SD of true validity; XVE = percentage of 
variance explained; 90% CV = credibility value. 


that are not aligned with the specific construct. For example, HPI Intellectance is 
unrelated to adjustment, likeability, and prudence criteria; HPI Sociability predicts 
none of the construct-based criteria. This pattern of lower correlations for the off 
diagonal scales supports discriminate validity. Another index of discriminate vali- 
dity comes from the overlap of the credibility values among scales. Except for HPI 
School Success, no lower bound credibility values for construct-aligned measures 
overlap any other scale, which suggests independence. This pattern of findings pro- 
vides further support for the discriminant validity of the predictor scales. 

These analyses provide strong support for the contention that the HPI is a valid 
predictor of job performance. With the exception of Sociability, each HPI scale is 
useful in predicting organizational criteria saturated with the same construct. The 
scales predict less well or not at all organizational criteria saturated with different 
constructs. 


Summary 


We close our discussion by making (or remaking) four points regarding how the HPI 
differs from other personality inventories based on the FFM. First, the HPI is theory- 
based because it comes out of a theory of personality, not because it originated with 
an evaluation of the FFM. Personality theories begin with some assumptions about 
human nature. The FFM makes no such assumptions; consequently, it is a taxonomy 
of variables, not a theory of personality. We believe it is misleading to claim that a 
personality inventory is theory-based because it starts with the FFM. 

Second, the HPI is based on a fully articulated model of personality — Socio- 
analytic theory. Socioanalytic theory argues that people are primarily motivated to 
get along and get ahead in life, that there are individual differences in peoples' abi- 
lity to achieve these goals, and that these individual differences are captured in pe- 
oples' reputations. The HPI is designed to predict individual differences in reputati- 
on, which reflect individual differences in a person's success in life. 
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Third, the HPI is designed to predict individual differences in peoples’ ability to 
get along and get ahead, and to predict how people will be described by others. The 
way a person is described by others is equivalent to that person's reputation. There- 
fore, the HPI is designed to predict a person's reputation. 

Finally, the HPI (along with the CPI) has a single-minded focus on validity. And 
we define validity in terms of the degree to which a measure predicts significant and 
important non-self-report performance outcomes. Thus, the HPI takes seriously the 
notion that assessment has a job to do. However, we are not advocating old- 
fashioned dustbowl empiricism. The HPI combines a utilitarian focus with the am- 
bitious theoretical agenda embodied in Socioanalytic theory. 
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The Six Factor Personality Questionnaire 


Douglas N. Jackson 
Paul F. Tremblay 


Introduction 


The Six Factor Personality Questionnaire (SFPQ; Jackson, Paunonen, & Tremblay, 
2000) extends and in certain ways redefines the popular Big Five factors. The im- 
mediate impetus for developing the SFPQ arose as a result of a series of confirma- 
tory factor-analytic studies using the scales of the Personality Research Form (PRF; 
Jackson, 1984), a published personality questionnaire that measures 20 variables of 
personality drawn largely from the work of Murray (1938). These factor-analytic 
studies, which are briefly described in this paper, revealed that a six factor solution 
consistently provided a better fit than did a five factor solution (Jackson, Paunonen, 
Fraboni, & Goffin, 1996). The factors are Extraversion, Agreeableness, Independ- 
ence, Openness to Experience, Methodicalness, and Industriousness. The fifth and 
sixth factors represent a division of the Big Five Conscientiousness factor into two: 
one factor reflecting Methodicalness and the other, Industriousness. These two fac- 
tors are correlated but distinguishable. In the SFPQ we provide the option of com- 
bining them into a general Conscientiousness factor. The SFPQ profile provides 
both a score for the Conscientiousness factor and a set of scores for Methodicalness 
and Industriousness. Users who prefer to distinguish the latter two factors may thus 
do so; those who prefer the broader Conscientiousness factor have that information. 
Those who are interested in information on how Methodicalness and Industriousness 
combine to yield a certain Conscientiousness score can also be satisfied. 

Another departure of the SFPQ from the Big Five conceptualizations is our iden- 
tification of an Independence factor that has been shown to define the opposite pole 
of the Neuroticism factor (Jackson, Ashton, & Tomes, 1996). We deliberately 
avoided any attempt to label any of the SFPQ scales as "Neuroticism" for several 
reasons. First, neuroticism as a construct is poorly defined and is not represented in 
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standard psychiatric nomenclature such as the DSM-IV (1994). Second, profession- 
als have demonstrated an inability to agree on its manifestations (Gough, 1957), in 
contrast to other terms like depression. Third, the blueprint for the development of 
the SFPQ was to encompass normal dimensions of personality, just as its predeces- 
sor, the PRF, did. Fourth, popular Neuroticism scales have been shown to be con- 
founded with an independent desirability factor (Jackson, Ashton, & Tomes, 1996). 
Finally, a strong Independence factor has consistently emerged from factor analyses 
of the PRF, and we believe it should have a prominent place among the factors of 
normal personality. 

A number of researchers have investigated the factor structure of the PRF since 
its introduction in the late 1960's. In some studies, the Big Five factors of personal- 
ity have been identified and claimed to account for the common variance in the PRF 
trait scales (Costa & McCrae, 1988; Paunonen, Jackson, Trzebinski, & Forsterling, 
1992; Paunonen, Keinonen, Trzebinski, Forsterling et al., 1996; Skinner, Jackson, & 
Rampton, 1976; Stumpf, 1993). In other studies, however, factor structures have 
included more than five factors. For example, Nesselroade and Baltes (1975) found 
eight oblique factors in a large sample of PRF respondents. In another study with the 
PRF, Stricker (1974) found six factors. 

Studies that have found more than five factors tended to split one of the Big Five 
Factors, Conscientiousness, into two factors. In the study by Nesselroade and Baltes 
(1975), for example, one factor was defined by Cognitive Structure, Order, and low 
Impulsivity. An equally clear factor was defined by Achievement and Endurance. 
The two factors correlated .43 when they were rotated using an oblique rotation. In 
our factor analyses we have chosen to label these factors as Methodicalness and In- 
dustriousness, respectively. 

The SFPQ was developed to realize a number of aims: (a) to measure the major 
factors of normal personality functioning in a brief, compact format; (b) to develop 
factor scales that meet modern standards for convergent and discriminant validity by 
emphasizing at the earliest stages of scale development, the suppression of response 
biases and the optimization of relevant content, while minimizing irrelevant content 
and correlations between scales; (c) to employ personality questionnaire items that 
are short, that have a clear, straight-forward vocabulary, that represent characteristic 
behavior that most normal adults have experienced or observed, and that have been 
subjected to rigorous multivariate item analyses. Because the original item pool was 
well over 3,000 items, the ratio of original to retained items was more than 30 to 1. 


Development 


The foundation for the SFPQ was the Personality Research Form. Chapter 2 of the 
PRF Manual (Jackson, 1984) provides a detailed outline of the steps in its develop- 
ment. These steps are outlined briefly here: (a) develop careful definitions of each 
pole of the 20 variables of personality based on the work of Murray (1938) and his 


The Six Factor Personality Questionnaire 355 


colleagues; (b) prepare large item pools for each personality dimension, which in the 
aggregate consisted of well over 3,000 items; (c) edit and select items for empirical 
evaluation; (d) conduct item analyses based on a total item pool of more than 2,700 
items using a procedure designed to maximize each item's content saturation while 
suppressing desirability variance and eliminating items correlating too highly with 
irrelevant scales; (d) developing parallel forms based on an algorithm minimizing 
statistical differences between forms; (e) constructing Form E based on a new item 
analysis using an algorithm developed for the purpose of maximizing the correlation 
of an item with its own scale, while minimizing correlations between scales. For- 
mulas and flow charts describing these procedures are contained in the PRF Manual. 
The starting point for the SFPQ was the set of items contained in the PRF Form E. 
These items had already been selected from the original pool with a selection ratio 
of approximately 10 to 1. 

We followed three distinct stages in constructing the SFPQ, which are described 
in an article by Jackson er al. (1996). The first stage was to compare five-factor and 
six-factor models of personality structure using confirmatory factor analysis of PRF 
scale scores. In the second stage, we assembled a questionnaire specifically designed 
to measure the proposed six-factor model of personality, and we evaluated some of 
its psychometric properties. That questionnaire, the SFPQ, was then used in the third 
stage to assess the predictability of the separate Industriousness and Methodicalness 
factors in relation to some criteria of social importance. The first two stages are 
summarized below. The third stage is described in the Validity section. 


Comparison of the Five-Factor and Six-Factor models 


Jackson et al. (1996) tested five-factor and six-factor models of personality structure 
based on two samples of PRF respondents. The first sample consisted of 306 first- 
year university undergraduates (143 men and 163 women). The second sample con- 
sisted of 2.141 men between the ages of 17 to 24 who completed the PRF as part of 
a selection procedure for a training program in the Canadian armed forces. 

The first model was based on the five-factor structure reported by Costa and 
McCrae (1988). In their study, Costa and McCrae factored the 20 PRF content 
scales along with marker scales from their NEO Personality Inventory. The six- 
factor model was based on the earlier research studies about the relations among the 
20 PRF traits, and it divides Conscientiousness into two separate factors we have 
labeled Industriousness and Methodicalness. The Industriousness factor was defined 
by the PRF traits of high Achievement, high Endurance, and low Play. The Meth- 
odicalness factor was defined by high Cognitive Structure and Order, and low Im- 
pulsivity. Jackson had proposed this division of the Conscientiousness factor as 
early as 1967 in the original PRF Manual (1967/1984) based on rational and theo- 
retical considerations. Empirical support for this division is also found in the studies 
by Nesselroade and Baltes (1975), and Siess and Jackson (1970). Other minor dif- 
ferences between Costa and McCrae's (1988) five factor model and the six-factor 
model are described in Jackson et al. (1996). Results of the confirmatory factor 
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analyses consistently revealed that the oblique six-factor model provided a statisti- 
cally significantly better fit than did the oblique modified five-factor model for both 


samples. 


Evaluation of a Six-Factor personality measure 


The second stage of the development of the SFPQ was based on the previous con- 
firmatory factor analyses. The previous sample of 306 respondents and a second 
sample of 113 undergraduates completed both the PRF and the Jackson Personality 
Inventory-Revised (1994) in a true/false format. Based on knowledge of scale con- 
tent and on previous exploratory factor analytic work, three PRF scales were chosen 
to define each of the six factors identified by Jackson et al. (1996). Three facet 
scales were chosen because three variables is the minimum number required to de- 
fine a factor in common factor analysis (Thurstone, 1947), and we wished to keep 
the SFPQ brief. Seventeen of the 18 facet scales were derived from the PRF. One 
JPI-R scale, Breadth of Interest, was introduced to help define the Openness to Ex- 
perience factor because we found this scale to have a higher loading than did the 
PRF Sentience scale that it replaced. The PRF scales Harmavoidance and Nur- 
turance were omitted due to their tendency to split into several factors. 

An item analysis was conducted on the 18 trait facets for the purpose of selecting 
the best six items (three positive and three negative exemplars) from the item pool. 
The item analysis was based on the sample of 306 respondents. Item means, vari- 
ances, item-scale correlations, and the Item Efficiency Index (Neill & Jackson, 
1970) were investigated. The item analysis revealed that Sentience did not contrib- 
ute as strongly as was expected to the Openness factor. It was, therefore, replaced by 
the JPI Breadth of Interest scale. Selection of the items from the Breadth of Interest 
scale was based on an inspection of the PRF and JPI responses of the sample of 113 
undergraduates, and on extensive previous item analyses. Two confirmatory factor 
analysis models, a five-factor and a six-factor model were tested by Jackson er al. 
(1996) using the sample of 113 undergraduates. A y nested model comparison was 
conducted between the two models. The Y results indicated that the six-factor solu- 
tion provided a significantly better fit to the data than did the five-factor model (y^ 
change = 24.77, df 2 5, p « .001). 


Description of the SFPQ 


The SFPQ is comprised of six factor scales and 18 facet scales. Three facet scales 
are subsumed under each factor scale, and each facet scale is measured by six items, 
resulting in a total of 108 items. The SFPQ requires approximately 20 minutes to 
administer. The instructions direct respondents to answer on а five-point scale 
whether they strongly disagree, disagree, are neutral, agree or strongly agree with 
the statements. Facet scale scores are obtained by summing the five-point items, 
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taking into account the direction of keying. Factor scale scores are calculated by 
summing scores for each of the three relevant facet scales. In addition to a paper- 
and-pencil format, the SFPQ is also available for administration and scoring via per- 
sonal computer. 

The SFPQ scales were developed to be bipolar. Thus the direction of scoring for 
any given scale was arbitrarily chosen. For example, a scale called Even-tempered 
could just as well have been scored in the direction of Aggression. This arbitrariness 
is particularly true because, for each facet scale, three items were selected to repre- 
sent the positive pole of the dimension and three to represent the negative pole. 

Based on previous research with the Personality Research Form (Reddon & Jack- 
son, 1989), we judge the vocabulary used in the SFPQ items to be below the fifth 
grade level. The fifth-grade level is the reading level of the Personality Research 
Form, from which most of the items were drawn. However, in the selection process 
for the SFPQ, preference was given to short, simple items. Furthermore, we intro- 
duced minor editorial changes to simplify items drawn from the PRF. The major 
reason for choosing short, simple items is that they have been shown, in general, to 
be more valid (Holden, Fekken, & Jackson, 1985). In Table 1, factor scales, facet 
scales, and scale definitions for high scorers are provided. 


Norms 


Adult SFPQ norms were based on the responses of 1,067 participants (483 men and 
584 women) from the United States and Canada. The method for obtaining names of 
potential SFPQ respondents was to review U.S. census data and to identify from a 
data base containing approximately 72 million telephone numbers and addresses, the 
names and addresses of persons representing a geographically diverse area. Respon- 


Table 1. Factor scales (in bold/italics); Facet scales (italics); Scale definitions for high scorers. 


Extraversion Enjoys the company of others; confident and comfortable in social 
situations; tries to control environment and influence or direct peo- 
ple; likes to have an audience and to be the center of attention. 


Affiliation Enjoys being with friends and people in general; accepts people 
readily; makes efforts to win friendships and maintain associations 
with people. 

Dominance Attempts to control environment and to influence or direct other 


people; expresses opinions forcefully; enjoys the role of leader and 
may assume it spontaneously. 

Exhibition Wants to be the center of attention; enjoys having an audience; en- 
gages in behavior that wins the notice of others; may enjoy being 
dramatic or witty. 


Agreeableness Is considerate, likable and cooperative; accepts criticism and blame; 
avoids confrontations and conflicts; is not easily offended. 
Abasement Shows humility; accepts blame and criticism even when not deserved; 


willing to accept an inferior position; tends to be self-effacing; read- 
ily fulfills others’ requests; is helpful. 


Even-tempered Imperturbable when faced with instigation to anger; avoids confron- 
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Good-natured 


й 


Іпаерепаепсе 


Autonomy 


Individualism 


Self Reliance 


Openness to Experience 


Change 


Understanding 


Breadth of Interest 


Methodicalness 


Cognitive Structure 


Deliberateness 


Order 


Industriousness 


Achievement 


Endurance 


Seriousness 
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tations and conflicts; does not express hostility, either verbally or 
physically; is not concerned with "getting even;" is forgiving of oth- 
ers' mistakes. 


Is willing to concede mistakes; willingly changes own opinions to 
ensure positive relationships; is not angered or upset by criticism; 
does not respond to attack or question; is not easily offended; has 
"nothing to hide.” 


Is self-determined and shows a high level of autonomy; enjoys being 
free and unrestrained in, various situations; is little concerned about 
reputation or others' praise or disapproval. 


Tries to break away from restraints, confinement, or restrictions of 
any kind; enjoys being unattached, free, not tied to people, places, 
or obligations; may be rebellious when faced with restraints. 


Unconcerned about reputation or social standing; insensitive to oth- 
ers’ praise or disapproval; does not necessarily conform to socially- 
approved norms in behavior and appearance. 


Does not look to others for guidance or support; is able to maintain 
oneself without aid; has confidence in and exercises own judgment; 
confronts problems alone; does not seek help, advice, or sympathy. 


Likes change and new experiences; is curious about many areas of 
knowledge; has a wide variety of interests. 


Likes new and different experiences; dislikes routine and avoids it; 
may readily change opinions or values in different circumstances; 
adapts readily to changes in environment; enjoys travel. 


Wants to understand many areas of knowledge; values a synthesis of 
ideas, verifiable generalizations, logical thought, particularly when 
directed at satisfying intellectual curiosity. 


15 attentive and involved; motivated to participate in a wide variety 
of activities; interested in learning about a diversity of things. 


Does not like ambiguity; thinks before acting; is organized and neat. 


Does not like ambiguity or uncertainty in information; wants all ques- 
tions answered completely; desires to make decisions based upon 
definite knowledge rather than upon guesses or probabilities. 


Acts with deliberation; is on an even keel; ponders issues and deci- 
sions carefully; thinks before acting; avoids spontaneity. 


Concerned with keeping personal effects and surroundings neat and 
organized; dislikes clutter, confusion and lack of organization; inter- 
ested in developing methods for keeping materials methodically or- 
ganized. 


Maintains high standards of work and aspires to reach challenging 
goals; persistent and unrelenting in work habits; is drawn more to- 
wards work than play; takes a serious approach to life. 


Aspires to accomplish difficult tasks; maintains high standards and is 
willing to work towards distant goals; responds positively to competi- 
tion; willing to put forth effort to attain excellence. 


Willing to work long hours; doesn't give up quickly on a problem; 
persevering, even in the face of great difficulty; patient and unre- 
lenting in work habits. 


Is subdued in thought, appearance, and manner; takes a serious ap- 
proach to life and to work; does not seek fun or amusement; avoids 
frivolity and idle pursuits. 
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Table 2. Characteristics of the Normative Sample 
—— A — BÀ GM —— ы тв ни НИ 


Percentage 
Variable Male Female Total 
Age 
« 20 4.9 8.3 6.8 
20-29 7.8 8.0 7:9 
30-39 17.6 19.8 18.8 
40-49 23.4 20.5 21.8 
50-59 14.8 13.9 14.3 
60-69 18.0 15.6 16.7 
70+ 13.5 13.9 sled 
Total 100.0 100.0 100.0 
Education 
Grade 8 or less 2.9 1.4 2-1 
Some high school 4.1 37 > 3.9 
High school graduate 23.1 34.3 29.2 
1-3 yrs college or university 31.9 35.0 33.6 
College or university graduate 22.2 12.8 17.0 
Post graduate education 15.8 12.8 14.2 
Total 100.0 100.0 100.0 
Marital status 
Never married 18.1 20.3 19.3 
Married 71.2 56.6 63.2 
Separated or Divorced 72 10.2 8.8 
Widowed 379 12.9 8 
Total 100.0 100.0 100.0 
Location 
Northwest 16.6 18.4 17.6 
Southwest ges 14.8 14.1 
South Central 15.6 15.0 1572 
North Central 28.9 29.2 29.1 
Pacific / Mountain 18.1 12.4 15.0 
Canada 7.5 10.2 9.0 
Total 100.0 100.0 100.0 


dents were selected randomly and a letter, together with a small monetary incentive, 
was sent to 1,500 potential respondents. Table 2 provides a breakdown of the nor- 
mative sample by age, sex, education, marital status and geographic location. 

Normative information provided in the SFPQ manual include descriptive statis- 
tics (means, standard deviations, skewness, and kurtosis) and percentile tables for 
males, females, and for the combined sample. 


Appropriate populations 


Because SFPQ norms span virtually all segments of the adult population, the SFPQ 
is applicable in a wide variety of assessment contexts. The relatively modest reading 
level of the SFPQ makes it appropriate for use with a large proportion of the general 
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population, and in addition, to specialized populations not necessarily possessing 
high-level reading skills. Some of these specialized populations might include ado- 
lescents, psychiatric patients, prison inmates, linguistic minorities, and immigrant 
samples. The SFPQ provides a broad assessment that can be useful in counseling, 
selection, and research. As with most instruments of assessment, the SFPQ should 
be used in conjunction with other reliable information in making decisions about 
respondents. 


Reliability 


Cronbach alpha values for the six factor scales and the individual facet scales of the 
SFPQ are presented in Table 3. These reliabilities are from two sources, the North 
American normative sample (№ = 1,067) and an Oregon community sample gathered 
by Lewis К. Goldberg (N = 671, personal communication). It can be seen that these 
values range from .76 to .86 (median - .81) for the factor scales and from .55 to .84 
(median = .65) for the facet scales in the normative sample. These internal consis- 
tency reliabilities should be interpreted in light of the breadth of the factor scales and 
the relative shortness of the facet scales (6 items each). 


Validity 


Convergent and discriminant validity 


The purpose of this stage of the SFPQ development was to examine the convergent 
and discriminant validity of the six-factor model within the framework of the multi- 
trait-multimethod matrix (Campbell & Fiske, 1959). A sample of 94 paid volunteer 
undergraduates (34 men and 60 women) were asked to participate in a roommate 
rating study. Participants were same-sex roommate pairs from a university residence 
who had lived together for at least seven months. They were first asked to provide a 
self-report of their behavior by completing the SFPQ. They were then asked to rate 
their roommates' behavior using the SFPQ. 

The multitrait-multimethod matrix framework requires the assessment of two or 
more traits by two or more methods. The validity coefficients (correlations between 
measures of the same trait by two different methods) should be higher than the cor- 
relations between different traits measured by different methods (heterotrait- 
heteromethod correlations), and also higher than the correlations for different traits 
measured by the same method (heterotrait-monomethod). The correlation matrix of 


The Six Factor Personality Questionnaire 361 


Table 3. Internal Consistency Reliability 


Cronbach's alpha 


Scale Normative Sample Oregon Sample 
Extraversion .86 .88 
Affiliation .73 .78 
Dominance .83 .86 
Exhibition 77 ‚80 
Agreeableness .80 .78 
Abasement .60 .54 
Even-Tempered .66 .65 
Good-Natured .60 .58 
Independence .76 .78 
Autonomy .54 59 
Individualism .72 .74 
Self-Reliance .57 57 
Openness to Experience .81 .82 
Change .61 .63 
Understanding #78. 74 
Breadth of Interest .65 .69 
Methodicalness .84 .83 
Cognitive Structure E55 .56 
Deliberateness .66 .68 
Order .84 .78 
Industriousness .77 .69 
Achievement „58 .47 
Endurance .61 .59 
Seriousness .65 .61 


SFPQ traits' self-ratings and peer ratings appears in Table 4. It can be seen that the 
validity coefficients are relatively high (mean r = .56) and significant (all p « .01). 
These coefficients were generally higher than the heterotrait-heteromethod correla- 
tions (mean absolute r = .09) and higher than the heterotrait-monomethod correla- 
tions (not shown). The mean absolute heterotrait-monomethod correlation was .13 
for self-ratings and .20 for peer ratings. Table 4 also reveals a modest average cor- 
relation (r = .41) obtained between Methodicalness and Industriousness. This corre- 
lation was expected based on the conceptual relatedness of these two constructs. 


Composite direct product analysis 


A composite direct product analysis (Browne, 1984) was used by Jackson et al. 
(1996) as a substitute for the more common confirmatory factor analytic procedure 
for multitrait-multimethod (MTMM ) matrices because the interpretation of multi- 
trait-multimethod results with only two methods of measurement is generally not 
possible due to the under-determination of factors (see Goffin, 1988; Goffin & Jack- 
son, 1992; Kenny & Kashy, 1992; Widaman, 1985). The MUTMUM computer pro- 
gram (Browne, 1990) was used to conduct this analysis. The overall fit of a six- 
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Table 4. SFPQ convergent and discriminant correlation coefficients based on self and peer ratings 
________ uL. cu o m, м э E U 


Self Ratings 
Peer Ratings AG EX IP OP IT ME 
Agreeableness (AG) .45** -.14 -.02 -.13 .00 .04 
Extraversion (EX) -.02 28 ‚04 .16 .04 -.16 
|пдерепдепсе (!Р) -.04 -.21 .56"* .01 -.05 -.16 
Openness to Experience (OP) — -.07 РОВ siti ЭЗ .10 -.21 
Industriousness (IT) -.13 -.14 ' -03 .10 557 .33* 


Methodicalness (ME) .00 -.16 -.02 -.11 dom .69** 


Note: *р < .01, **р < .001. N = 94.; Convergent validity coefficients in italics; Correlations 
above the diagonal are based on self-ratings and those below the diagonals are based on peer 
ratings. 


factor model and a five-factor model were compared in the context of the present 
MTMM data. Method factors were specified in addition to trait factors. The models 
incorporate method factors to account for method variance, and this allows one to 
evaluate the possibility that one or more of the trait factors are composed mainly of 
method variance. The five-factor model tested was identical to the six-factor model, 
with the exception that the correlation between the Methodicalness and Industrious- 
ness factors was constrained to unity in the former case. 

The five-factor oblique model applied to the MTMM data resulted in a y? = 77.8, 
df = 44. The three fit indices, Expected Cross-Validation Index (ECVI; Browne & 
Cudeck, 1989), Root Mean Square Error of Approximation (RMSEA; Steiger, 
1989), and Relative Noncentrality Index (RNI; MacDonald & Marsh, 1990), for this 
model were 1.57, 0.09, and 0.90 respectively. The six-factor oblique model resulted 
in a X? = 51.1, df = 43. The ЕСУІ, RMSEA, and RNI for this model were 1.30, 0.05, 
and 0.98 respectively. These results indicated that the six-factor model had a con- 
sistently better fit than did the five-factor model on all indices. The y^ difference test 
of nested models revealed a significant increment in model fit by specifying six 
factors (x? change = 26.68, ај = 1, p < .001) rather than five. These results are con- 
sistent with those of Stages 1 and 2, indicating that a six-factor model provides a 
better fit than does a five-factor model for the personality domain encompassed by 
the variables of the PRF. 


Criterion predictability 


Jackson et al. (1996) addressed the differential predictability of the Industriousness 
and Methodicalness factor scales. If the two domains of behavior really pertain to 
one and the same personality factor (i.e., Conscientiousness), then they should both 
predict the same criterion variables approximately equally well (or equally poorly). 
Rather, if they are distinct (albeit correlated) factors, then they will each add incre- 
mentally to the other in the prediction of certain criteria. To evaluate the situation 
with the SFPQ, the Industriousness and Methodicalness factor scales were correlated 
with various criterion variables measured in a sample of 94 dormitory roommates. 
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Industriousness predicted grade point average (r = .24, p « .05) to a moderate de- 
gree, whereas Methodicalness did not (r = -.01, ns). Methodicalness was negatively 
correlated with smoking behavior (r = -.25, p « .05), but the correlation between 
Industriousness and smoking behavior was not significant (r = -.16, ns). Finally, 
Methodicalness was negatively correlated with selection of a liberal arts program of 
study (r = -.22, p « .05), but Industriousness was not significantly correlated with 
this criterion (r = -.07, ns). This differential pattern of predictive validities between 
Industriousness and Methodicalness supports the utility for our conceptualization of 
two correlated facets of Conscientiousness within a six-factor model of personality 
structure. However, these findings need to be verified in a more definitive manner 
such as the development of a more extensive array of criterion variables for which a 
panel of knowledgeable judges would make differential predictions for industrious- 
ness and methodicalness and the comparison of these predictions with empirical 
results. | 


Confirmatory factor analysis of the normative sample 


Correlations among the SFPQ factor scales and the facet scales are presented in Ta- 
ble 5. The correlations in the bottom triangle are based on the normative sample; 
the correlations in the top triangle are based on a community sample collected by L. 
R. Goldberg (1999). The correlation between Methodicalness and Industriousness is 
41 in the normative sample and .34 in the Oregon sample. There is also a modest 
correlation between Extraversion and Openness to Experience (r = .36 in the nor- 
mative sample and r = .28 in the Oregon sample). The other correlations between 
factor scales are small, suggesting that there is minimal overlap among the scales. 

A confirmatory factor analysis of the SFPQ was performed on the normative 
sample. The standardized solution is presented in Tables 6 and 7, with Table 7 con- 
taining the correlations among the SFPQ factors. All loadings shown are statistically 
significant. The solution revealed a "a 20920 8544-122; p»«-.01), aneECVI = 1.22 
and a RMSEA = .09. These results are similar to those found by Jackson et al. 
(1996), confirming the presence of six factors. 


Relations between the SFPQ and Five other personality inventories 


Evidence of convergent validity of the SFPQ has been found by inspecting its statis- 
tical relations with five other personality inventories. The data were from a large 
collaborative research project on the structure of personality conducted by L. R. 
Goldberg at the Oregon Research Institute. Goldberg (1999) collected personality 
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Table 6. Confirmatory Factor Analysis of the Normative Sample 


——— MÀ — X9 À— —— ОЗЫНЕ НИНИН 
Factors 


Facet Scales EX AG IP OP ME IT 
Affiliation .66 


Dominance .51 

Exhibition .91 

Abasement .66 

Even-tempered 357 

Good-natured .85 

Autonomy .69 

Individualism .43 

Self reliance 6) 

Change ‚42 

Understanding 71 

Breadth of Interest .92 

Cognitive Structure .70 
Deliberateness 277. 

Order .62 
Achievement enti 
Endurance 78 
Seriousness .45 


Note: EX = Extraversion, AG = Agreeableness, IP = Independece, OP = Openness to Experience, ME = 
Methodicalness, IT = Industriousness; Blank scales refer to parameters fixed at zero. 


data on an Oregon community sample of over 800 participants using several well 
known personality inventories. In Table 8 we present the correlations between the 
factor scales of the SFPQ and the NEO Personality Inventory-Revised (NEO-PI-R; 
Costa & McCrae, 1992). 

This section also presents a summary of the results regarding relations between 
the SFPQ factor scales and those of the 16 Personality Factor Questionnaire (16 PF; 
Cattell, Cattell, & Cattell, 1993), the NEO Personality Inventory-Revised (NEO-PI- 
R; Costa & McCrae, 1992), the California Psychological Inventory (CPI; Gough, 


Table 7. Correlations among factors 


Factors 
Factor Scales EX AG IP OP ME IT 
Extraversion (EX) 
Agreeableness (AG) -.09 
Independence (IP) -.20 
Openness to Experience (OP) .39 as 
Methodicalness (ME) -.09 sil? 11 
Industriousness (IT) 212 227 .30 .54 


Note: EX = Extraversion, AG = Agreeableness, IP = Independece, OP = Openness to Experience, ME = 
Methodicalness, IT = Industriousness; Blank scales refer to parameters fixed at zero. 
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Table 8. Correlations between the SFPQ factor scales апа NEO-PI-R factor scales 


NEO-PI-R Scales 


SFPQ Factor Scales Ex Ag N Op Co 
Extraversion" : “Al 16б =) 7s ‚.19 
Agreeableness | .00 аии -.04 .00 
Independence | -.25 -.13 -.22 14. -.01 
Openness to Experience .23 -.03 -.10 .67 -.04 
Methodicalness TEE: E03 .08 -.19 -.27 .66 
Industriousness .00 .02 -.10 -.03 .44 


Note: Ех = Extraversion, Ag = Agreeableness, М = Neuroticism, Ор = Openness to Experience, Со = 
Conscientiousness. 


1996), the Hogan Personality Inventory (HPI; Hogan & Hogan, 1995), and the Јаск- 
son Personality Inventory-Revised (JPI-R; Jackson, 1994). The relations between 
the ЗЕРО and the other personality inventories are presented in the ЗЕРО manual. 
Some important results are summarized in Table 9. Specifically, we list each SFPQ 
factor scale and its 9 highest correlations with the other five inventories. These cor- 
relations generally support the convergent validity of the SFPQ factor scales with 
respect to conceptually related published personality scales. 


Relations between the SFPQ and the NEO-PI-R 


A study by Jackson, Ashton, and Tomes (1996) provided evidence in support of the 
hypothesis that the present six-factor model of personality structure can be repre- 
sented by NEO-PI-R scales, as well as those of the SFPQ. A sample of 144 under- 
graduate university students (80 men and 64 women) completed the SFPQ and NEO 
Personality Inventory-Revised (NEO-PI-R; Costa & McCrae, 1992). The NEO-PI-R 
is a 240-item questionnaire measuring the Big Five factors with 30 facet scales. To 
investigate the extent to which social desirability variance is present in the NEO-PI- 
R and the SFPQ, two desirability scales from the AA and BB forms of the PRF were 
converted to a five-point response format and included in the study. In contrast to 
the approach taken in the construction of the PRF and the SFPQ, where response 
styles, including social desirability, were minimized at several stages of test devel- 
opment, the selection of NEO-PI-R items did not include explicit minimization of 
social desirability variance (Costa & McCrae, 1988. p. 259). The authors of the 
NEO PI-R argued that it would be difficult to measure inherently desi- 
rable traits if socially desirable items were removed. However, a serious problem 
can emerge when response style variance is confounded with content variance. The 
presence of response style variance can contribute to correlations between scales and 
therefore limit the convergent and discriminant validity of the test. 

A principal components analysis was conducted on the combined SFPQ and 
NEO-PI-R data, followed by an orthogonal rotation to a targeted criterion (Schóne- 
mann, 1966). A procedure proposed by Paunonen (1997) to evaluate whether or not 
results from the targeted rotation can be attributed to capitalization on chance was 
also used. 
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em 9. Correlations between the SFPQ factor scales and facet scales from other personality in- 
ventories 


SFPQ Extraversion SFPQ Agreeableness 
.79 Social Confidence (JPI-R) ‚56 Compliance (NEO PI-R) 
.72 Sales Potential (HP!) -.54 Angry Hostility (NEO PI-R) 
.71 Social Boldness (16PF) .51 Service Orientation (HPI) 
.71 Sociability (CPI) .50 Empathy (НР!) 
‚70 Dominance (СР!) -.48 Tension (16 PF) 
.65 Ambition (HPI) .48 No hostility (HPI) 
.65 Self Acceptance (СР!) .46 Adjustment (HPI) 
.64 Sociability (HPI) ‚44 Even-tempered (HPI) 
.64 Assertiveness (NEO PI-R) .40 Good Impression (CPI) 
SFPQ Independence SFPQ Openness to Experience 
-.62 Cooperativeness (JPI-R) ‚78 Breadth of Interest (JPI-R) 
-.51 Sociability (JPI-R) .59 Openness to Change (16 PF) 
-.43 Not autonomous (HPI) .59 Ideas (NEO PI-R) 
-.41 Gregariousness (NEO PI-R) .59 Complexity (JPI-R) 
.39 Self Reliance (16 PF) .56 Aesthetics (NEO PI-R) 
-.39 Empathy (JPI-R) .56 Innovation (JPI-R) 
-.34 Warmth (16 PF) .53 Actions (NEO PI-R) 
-.33 Anxiety (JPI-R) .52 Culture (HPI) 
-.32 Appearance (HPI) .51 Intellectance (HPI) 
SFPQ Methodicalness ЗЕРО industriousness 
„71 Organization (JPI-R) .46 Achievement Striving (NEO PI-R) 
-66 Order (NEO Pi-R) .39 Self-Disciptine (NEO PI-R) 
.64 Perfectionism (16 PF) .34 Dutifulness (NEO PI-R) 
.53 Deliberation (NEO PI-R) .31 Mastery (HPI) 
-.50 Flexibility (CPI) .31 Energy Level (JPI-R) 
.50 Self Discipline (NEO PI-R) .30 Organization (JPi-R) 
.48 Dutifulness (NEO PI-R) .29 Perfectionism (16 PF) 
-.47 Abstractedness (16 PF) .29 Activity (NEO PI-R) 
.40 Prudence (HPI) .27 Competitive (HPI) 


Note: All correlations are significant, р < .01. JPI-R = Jackson Personality Inventory-Revised 
(Jackson, 1994), НР! = Hogan Personality Inventory (Hogan & Hogan, 1995), NEO PI-R = Revised NEO 
Personality Inventory (Costa & McCrae, 1992), 16 PF - 16 Personality Factor Questionnaire (Cattell, 
Cattell, & Cattell, 1993), СР! = California Psychological Inventory (Gough, 1996). 


Six factors corresponding to those in the SFPQ were hypothesized, plus one for 
the Desirability scales. For the Extraversion, Agreeableness, and Openness to Expe- 
rience factors, all SFPQ and NEO-PI-R facet scales originally designed to assess 
these factor scales were targeted on their appropriate factor. Scales targeted for the 
Independence (low Neuroticism) factor included the three Independence facet scales 
from the SFPQ and the Neuroticism facet scales (assigned negative target loadings) 
from the NEO-PI-R. The targeted variables for the Methodicalness and Industrious- 
ness factors included the respective facet dimensions from the SFPQ. Furthermore, 
all the Conscientiousness facet scales from the NEO-PI-R were targeted on both 
Methodicalness and Industriousness factors. The seventh factor was targeted using 
the two desirability scales and one facet scale from each of the SFPQ and NEO-PI-R 
factor scales which correlated most strongly (in the present analysis) with the desir- 
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ability scales. The reason for targeting one facet scale from each of the NEO-PI-R 
and SFPQ factors was to ensure that the desirability factor so derived would be de- 
termined by a heterogeneous set of content scales rather than only those content 
scales most highly related to desirability. 

Although the rotated matrix of factor loadings is not presented here, it can be 
found in the article by Jackson, Ashton, and Tomes (1996). However, Table 10 dis- 
plays the mean factor loadings for targeted NEO-PI-R and SFPQ scales on the six 
factors. It can be seen that the average loading is 0.63 for the SFPQ and .54 for the 
NEO-PI-R, indicating that both measures define the six factors, with a somewhat 
higher average loading for the SFPQ. 

One potential issue regarding the comparison of the NEO-PI-R and the SFPQ is 
that the NEO-PI-R Concientiousness scales were targeted on both the Methodical- 
ness and Industriousness factors. This might lead to a reduced average loading when 
the two factors are considered together. One alternative is to calculate the average 
loading for the NEO-PI-R on the Methodicalnness and Industriousness factors by 
using only the three highest Conscientiousness scale loadings. This results in an av- 
erage mean loading of .59 for the Methodicalness factor, .57 for the Industriousness 
factor, and .58 for the overall average. It is particularly noteworthy that the SFPQ 
achieves at least a comparable level of factor separation and loading magnitude to 
the NEO PI-R with less than half the items (108 vs 240). 

Also found in Table 10 are the desirability saturation values for each SFPQ and 
NEO-PI-R factor. These values were obtained by averaging the corresponding 
loadings for each ЗЕРО facet scale and each NEO-PI-R facet scale on the Desirabil- 
ity factor. It can be seen that the saturation values are noticeably higher for all of the 
five NEO-PI-R factors than they are for the corresponding ЗЕРО factors. On aver- 
age, the desirability saturation value is higher in the NEO-PI-R than it is in the 
SFPQ (mean = .30 vs .17). 


ЗЕРО modal profiles 


The SFPQ provides a set of scale scores that can be interpreted individually or as a 
series of standardized scale scores in a profile. To interpret an individual's profile, 
one could, for example, identify the scores that are at or below the 16" percentile, 
and at or above the 84" percentile. This exercise would provide a clear description 
of an individual's salient attributes. In addition to this descriptive information, one 
might be interested in evaluating the similarity of an individual's pattern of person- 
ality attributes to 'typical' or *modal' profiles for some group. A modal profile refers 
to the pattern of personality attributes that is characteristic of a subset or cluster of 
persons in a particular population who share certain high and low personality char- 
acteristics. 

In some cases, typical profiles have been extensively researched and documented. 
Consider the example of Type A behavior, in which modal profile analysis casts 
additional light on what was believed to be a unitary syndrome. It was commonly 
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Table 10. Mean factor loadings and desirability saturation for targeted NEO-PI-R and SFPQ scales 


Factor Loadings Desirability Saturation 

Factors | ЗЕРО NEO PI-R SFPQ NEO PI-R 
Extraversion 0.69 0.54 0.16 0.31 
Agreeableness 0.62 0.56 0317 0.41 
Openness to Experience 0.68 0.62 0.14 0.23 
Independence 0.57 0.61 0.17 0.27 
Industriousness 0.60 0.46 0.24 

Methodicalness 0.60 0.47 0.11 

Conscientiousness .26 
Average 0.63 0.54 0.17 0.30 


believed that the Type A behavior pattern was characterized by a sense of time ur- 
gency, proneness to anger and hostility, high competitiveness, impatience, and 
achievement strivings. However, Gray, Jackson, and Howard (1990) demonstrated 
that there was not a strong unitary Type A pattern. Rather, three distinct profiles of 
Type A behavior were identified, each with its unique patterning of high and low 
points. 

Consider another application of profile analysis. That method was used exten- 
sively by Jackson and Williams (1975) in the development of interpretive informa- 
tion for the Jackson Vocational Interest Survey (Jackson, 2000). Profiles of profes- 
sional and academic interests were developed for different occupational groups. For 
example. a cluster of several engineering specialists (e.g., mechanical, electronic, 
civil) emerged defined by high interests in mathematics, physical science, engineer- 
ing, and skilled trades, and low interests in social service, elementary education, and 
author-journalism, among others. 

In other cases, such as that described below for the SFPQ, profile analysis has 
been used as an exploratory procedure to discover patterns and similarities among 
people and attributes. This particular application is useful in the early stages of a 
research program. In the case of the SFPQ, no specific patterns of attributes were 
hypothesized and, therefore, modal profile analysis was used as an exploratory pro- 
cedure. The validity of a particular profile can be established and could be consid- 
ered as a new conception to describe particular types of individuals. One initial step 
in establishing the stability of such a particular profile or set of profiles is its repli- 
cation in diverse samples drawn from different populations of individuals. 

The first author and his colleagues have maintained an active interest in the clas- 
sification of personality and the development of procedures to derive modal profiles. 
This work has evolved into a statistical procedure labeled modal profile analysis 
(Jackson & Williams, 1975; Skinner, Jackson, & Hoffmann, 1974; Skinner, Reed, & 
Jackson 1976). This technique is a multivariate classification strategy. Unlike clus- 
tering techniques, which place entities displaying specific attributes into discrete 
categories, modal profile analysis locates clusters of entities in a multidimensional 
space. This is done by performing a singular value decomposition (related to princi- 
pal components analysis) using a data matrix in which the columns are people and 
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the rows are personality attributes. Whereas principal components analysis (or factor 
analysis) is usually used to find patterns among a subset of variables, the objective 
of modal profile analysis is to find patterns among people. The principal component 
scores represent the projections of each attribute in a multivariate space defined by 
the entities (in the present case, people). 

One important advantage of modal profile analysis over other numerical cluster- 
ing procedures is that it does not require that an individual be placed in a discrete 
category. Instead, one can evaluate the extent to which an individual fits a modal 
profile on a continuous similarity scale. For example, one could correlate an indi- 
vidual's profile with various modal profiles and compare the fit by examining the 
magnitude of the correlations of the individual's profile with each modal profile. In 
general, there will not usually be a perfect fit between an individual and a given pro- 
file; however, the individual's profile will usually resemble one modal profile more 
closely than it will resemble other profiles. 


Description of the SFPQ modal profiles 


The SFPQ male and female modal profiles were derived from the normative sample. 
Five profiles were derived for each sex, based on an inspection of the eigenvalues. 
Each profile represents a bipolar dimension and can therefore be reversed to produce 
the ‘negative’ pole by subtracting the value of each attribute in the profile by the 
maximum scale score and taking the absolute value of the difference. The five pro- 
files are arbitrarily labeled as representing the positive pole of the typal dimensions. 
Table 11 presents the percentile scores for the positive and negative exemplars of the 
five male and five female SFPQ modal profiles. It was necessary to define distinct 
male and female modal profiles because measured personality does differ markedly 
between males and females as reflected both in mean scores and in the organization 
of types. We deemed it more appropriate to center each sex's set of modal profiles 
on sex-based norms, rather than using combined norms because the latter strategy 
would spuriously confound typal differences with mean differences. 


Comparison of classification efficiencies 


Using the criterion that an individual's profile must correlate at least .50 with a 
given modal profile to be considered classifiable, the classification rate was 72.6 per 
cent for males and 66.5 per cent for females in the SFPQ normative sample. Persons 
not classified with respect to a profile type can be considered as representing mixed 
or infrequently occurring types. 


Individuals representing modal profiles 


Provided in Figure 1 are the SFPQ profiles of two individuals in the normative sam- 
ple. The first profile depicts a male who is Independent, somewhat Open to Experi- 
ence somewhat Industrious, not Agreeable, and not Methodical. His scores on the 
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Table 11. SFPQ modal profile percentiles 
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three Extraversion facet scales differ substantially showing an average level of Do- 
minance and Exhibition but a very low level of Affiliation. It is noteworthy that he 
displays a relatively high level of Industriousness but a low level of Methodicalness. 
This is an example of a case where information would be lost if Industriousness and 
Methodicalness scales were aggregated into a Conscientiousness scale. Instead, the 
SFPQ Methodicalness-Industriousness configuration reveals someone who works 
hard but is not well organized. This male profile correlates .64 with the male modal 
profile 4+ presented in Table 11. 
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Figure 1. SFPQ profiles of a male and a female in the normative sample 
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The second profile depicts a female who is somewhat Extraverted, somewhat Indus- 
trious, not Open to Experience, and not Methodical. She also has a moderate level of 
Agreeableness and Independence (although a low score on Individualism). This fe- 
male profile correlates .75 with the female modal profile 5+ (see Table 11). 


Research using modal profiles 


The evaluation of ЗЕРО modal profiles can be viewed as an alternative to the more 
typical evaluation of relationships of single personality variables to external criterion 
variables. Using modal profiles, one is interested in the external characteristics or 
behaviors associated with typical individuals who define a given modal profile. For 
example, are certain profile types more likely to be more satisfied or more effective 
workers? Or, as has been done with the Personality Research Form (Jackson, Pea- 
cock, & Smith, 1980: Rothstein & Jackson, 1980), one could evaluate the judged 
suitability or probable success of job applicants showing different modal profiles. 
The relation of personality profile membership and vocational preferences and inter- 
ests (Siess & Jackson, 1970) is another possibility. 


Conclusions 


The structure of the SFPQ has been supported by a number of confirmatory factor 
analyses. Evidence of convergent and discriminant validity also is presented. Note- 
worthy is the average uncorrected validity of .56 when the SFPQ factor scales are 
correlated with peer ratings. The separation of the Conscientiousness factor into 
Methodicalness and Industriousness was substantiated by several confirmatory fac- 
tor analyses and also by the predictive validity indices. These promising results were 
obtained even though the SFPQ was developed in such a way as to suppress the de- 
sirability confound. Indeed, SFPQ scales show low desirability factor loadings and 
correlations with desirability scales. Given the need for brief yet comprehensive 
assessment measures, the SFPQ appears well suited to measure personality on a 
broad range of personality traits. 
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Chapter 16 


Zuckerman-Kuhlman Personality Questionnaire 
(ZKPQ): An alternative five-factorial model 


Marvin Zuckerman 


Introduction 


The ZKPQ was developed as the result of an attempt to define the basic factors of 
personality or temperament. The question arose in preparation for writing my book 
"Psychobiology of Personality" (Zuckerman, 1991). The book needed some frame- 
work for a top-down approach from personality traits through levels of intermediate 
biological levels to the genetic bases of personality traits (Zuckerman, 19932). But 
what is a basic factor and which factors are basic? (Eysenck, 1992; Zuckerman, 
1992). Factor analysis has been the classical method used to answer these questions. 
However, as we all know, what you get out of a factor analysis is limited by what 
you put into it. Our guiding assumption was that basic personality traits are those 
with a strong biological-evolutionary basis. Therefore we started with scales which 
had been used in psychobiological research and embodied concepts amenable to 
translation into comparative behavior among other species. For example, aggression 
rather than agreeableness, and impulsive sensation seeking rather than conscien- 
tiousness. Sensation seeking has been shown to have many biological correlates 
(Zuckerman, Buchsbaum, & Murphy, 1980) and to be a useful comparative model 
(Zuckerman, 1984). However, when it has been included in other systems it is usu- 
ally in the form of a single scale ignoring the facets or subtypes of the trait which 
have differential associations with some biological traits. 

The “Big Five" originated in lexical analyses of words (generally adjectives) with 
connotations for personality. We started with scales which had been used as meas- 
ures of temperament or involved in psychobiological studies of personality. 
Eysenck's (1967) "Big Three" (extraversion, neuroticism, psychoticism) were an 
obvious starting point. Ostensible measures of temperament such as the Buss- 
Plomin (1975) scales for emotionality, activity, sociability, and impulsivity, and 
scales from Strelau's (1983) Temperament Inventory, based on Pavlovian theory, 
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were included. Sensation seeking is a trait that has shown a high heritability and 
many psychophysiological and biochemical correlates suggesting a biological basis 
for the trait (Zuckerman, 1979; 1984; 1994a). However, the most widely used form 
of this scale contains four subfactors and some studies had suggested that the bio- 
logical bases were sometimes specific to one or the other of the factors, therefore all 
four were included in our analyses. Similarly, impulsivity has been measured in a 
number of different ways and we felt it was necessary to include several markers for 
this trait. Emotional traits like anxiety and aggression-hostility are considered basic 
to most theories of temperament and such measures were also included. Measures of 
socialization and responsibility were also included. Measures of social desirability 
were also included to be sure that none of the basic factors were primarily defined 
by this response set. 

In total we included at least three scale markers for each of nine hypothesized 
factors: sociability, general emotionality (neuroticism), anxiety, hostility, socializa- 
tion, sensation seeking, impulsivity, activity, and social desirability. No measures of 
cultural interests or intellectual styles were included because of our conception of 
basic personality traits as comparative to traits in other species (Zuckerman, 1984). 
This is why we could not find a trait like *Openness to Experience," one of the Big 
Five. 


Procedure of development 


Based on the rationale described above, a total of 46 scales were selected from eight 
different questionnaires to represent the hypothesized factors with anywhere from 
three to nine potential markers for each factor (Zuckerman, Kuhlman, & Camac, 
1988). The subjects were 271 students (73 men and 178 women) from an under- 
graduate class in personality psychology. Both oblique and orthogonal rotations 
were used with nearly identical results from both. A scree test suggested that four or 
five factors would be sufficient, but in order to clarify the possible hierarchal nature 
of the structure we analyzed the results at the seven-, five-, and three-factor levels. 
In this way we could see how primary factors merged to form the superordinate 
factors going from the seven- to three-factor levels. Figure 1 shows the correlations 
between factors across the three levels. 

At the three-factor level Eysenck's Big Three were clearly identified. His E and 
N scales had the highest loadings on the first two factors and his P scales had the 
second highest loading on the third factor. The E factor contained scales measuring 
sociability and activity. The N factor was comprised of scales for neuroticism, anxi- 
ety, anger, hostility, general emotionality, lack of emotional control, and work effi- 
ciency. Other than P itself, the P factor consisted of a scale for autonomy or inde- 
pendence, nearly all of the sensation seeking and impulsivity subscales, and at the 
opposite pole, scales for socialization, planning, responsibility, restraint, and social 
desirability. This factor was labeled Impulsive Unsocialized Sensation Seeking (Im- 
pUSS). These three factors were compared across genders and were nearly identical 
in structure in men and women. 
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Figure 1. Factors at each level (three-, five-, and seven-factor analyses) and factor score corre- 
lations across levels. (E = Extraversion, Sy = Sociability, М = Neuroticism, Emot = Emotionality, P 
- Psychoticism, ImpUSS - Impulsive, Unsocialized Sensation Seeking, Act - Activity, AggSS - Ag- 
gressive Sensation Seeking, Anx = Anxiety, Aut vs Conform = Autonomy vs. Conformity, Imp = 
Impulsivity. From ‘What lies beyond E and М? Factor analyses of scales believed to measure basic 
dimensions of personality' by M. Zuckerman, D. M. Kuhlman, & C. Camac, 1988, Journal of Per- 
sonal and Social Psychology, 54, figure 2, p. 103. Copyright 1988 by American Psychological Asso- 
ciation. 


In the five-factor analysis the extraversion factor split into its sociability and ac- 
tivity components and the P-ImpUSS factor was split into impulsive and aggressive 
sensation seeking factors. In the seven-factor solution the neuroticism factor split 
into separate anger and anxiety factors. 

A second study was done in order to sharpen the hierarchal model with a larger 
sample of subjects (N = 525) and a reduced number of scales (33) with several 
markers for each of the narrower factors revealed in the first study (Zuckerman, 
Kuhlman, Thornquist, & Kiers, 1991). This time factor rotations were done for 
three, four, five, six, and seven factors. Figure 2 shows the correlations between 
factor scores of subjects across the three- to six-factor levels. The seven-factor solu- 
tion had a factor consisting of only one scale and therefore was ignored in further 
analyses. Separate sociability and activity factors emerged at the six-factor level. 
The sociability factor remained unchanged through the five-, four-, and three-factor 
levels. The activity factor was largely absorbed into the N-Anxiety factor in the 
four-factor analysis, but shifted to the sociability factor in the — supraordinate — 
three-factor analysis. N-Anxiety and Aggression-Hostility formed two separate fact- 
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Figure 2. Factors at each level (three-, four-, five-, and six-factor analyses) and factor score corre- 
lations across levels for the total group. (М = Neuroticism, Agg-Host = Aggression-Hostility, Emotion = 
Emotionality, PUss = Psychopathy (Psychoticism) - Impulsive Unsocialized Sensation Seeking, Imp = 
Impulsivity. The strongest loading scales defining each factor at six- and three-factor levels are 
indicated. From "Personality from top (traits) to bottom (genetics) with stops at each level bet- 
ween” by M. Zuckernan in Foundations of Personality (Figure 3, р. 77) edited by J. Hettema & I.J. 
Deary, 1993, Dordrecht, Netherlands: Kluwer Academic Publishers. Copyright 1993 by Kluwer Aca- 
demic publishers. Reproduced with permission of Kluwer. 
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ors from the six- to the four-factor levels but at the three-factor level the hostility 
scales shifted to the N-Anxiety and the aggression scales to the P-ImpUSS factor. At 
the six-factor level sensation seeking and impulsivity scales formed separate factors 
with the P scale itself more strongly attached to the sensation seeking factor. But 
from the five-factor through the three-factor solutions the sensation seeking and im- 
pulsivity scales were merged into a common P-ImpUSS factor. 

АП of the above results were based on the combined gender group. Factor reli- 
abilities were the primary determinants of the analyses to use in the development of 
a questionnaire. Certainly, one would want factors that were the same for men and 
women. We therefore calculated congruency coefficients comparing male and fe- 
male participants. Both three- and five-factor solutions were equally robust with 
average coefficients from .95 to .96 for corresponding factors and low coefficients 
for divergent factors averaging close to zero. However, the four-factor solution was 
not as reliable and in the six-factor solution the impulsivity factor could not even be 
identified in women. In view of these results we decided to proceed with the five- 
factor results because they offered the maximum specificity with no reduction of 
factor reliability. This is what led to the combination of impulsivity and sensation 
seeking in a single scale. Apart from the high coherence of these traits from the five- 
factor analysis up to the three-factor one, the theory and past research suggested that 
this was “а marriage of traits made in biology" (Zuckerman, 1993b). 


Development of the ZKPQ full scale 


The items from all of the scales used in the 1991 study, except for those in the 
Eysenck Personality Questionnaire (ЕРО), were correlated with the five-factor 
scores, calculated for each subject (N = 522). The five factors were: Impulsive Sen- 
sation Seeking (ImpSS), Sociability (Sy), Neuroticism-Anxiety (N-Anx), Aggres- 
sion-Hostility (Agg-Host), and Activity (Act). Twenty items were selected to repre- 
sent each factor on the basis of their demonstrating high correlations with that factor 
and lower correlations with other factors and a social desirability scale. At this point 
some of the items were rewritten. This first form of the ZKPQ containing 100 items 
was given to a new group of 589 subjects and the items were factor analyzed. Scree 
tests unambiguously indicated the appropriateness of a five-factor solution. Of the 
100 items selected on the basis of item-total score correlations, 89 loaded signifi- 
cantly and primarily on the factors to which they had been previously assigned. 
However, some of the items in the Sy scale had to be rewritten because of an ex- 
treme skewness in the distribution of scale scores. Ten new items were added for a 
validity scale to eliminate individual records influenced by an extreme social desir- 
ability set. The final form of the ZKPQ therefore consists of 99 true-false items and 
is usually completed in 15 to 20 minutes testing time. 
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Facet scores for the ZKPQ 


Factor analyses were done on the items within each of the five major scales de- 
scribed above in order to see if facet or subfactor scales within the major scales 
could be derived. For two of the scales, N-Anx and Agg-Host, the scree plots indi- 
cated the sufficiency of a one-factor solution. For the remaining three scales, ImpSS, 
Sy, and Act, two-factor solutions were indicated. The Sy scale contained factors for 
(1) liking lively parties and friends, and (2) intolerance of social isolation. The 
ImpSS scale factors were (1) sensation seeking, and (2) impulsivity (particularly of 
the nonplanning type). The two factors in the Act scale were (1) need for general 
activity, and (2) need for work activity. 


Scale descriptions 


Impulsive Sensation Seeking (ImpSS) 


This scale has 19 items. The impulsivity items describe a lack of planning and a ten- 
dency to act quickly on impulse without thinking. The sensation seeking items de- 
scribe a general need for thrills and excitement, a preference for unpredictable situa- 
tions and friends, and the need for change and novelty. Unlike earher forms of the 
sensation seeking scale (forms II, IV, and V) this scale contains no items mentioning 
specific activities like drinking, drugs, sex, or risky sports. Such items were elimi- 
nated to avoid confounding in studies of persons who actually engage in one or an- 
other of these activities. 


Neuroticism-Anxiety (N-Anx) 


The 19 items in this scale describe emotional upset, tension (i.e., "I sometimes feel 
edgy and tense"), worry, fearfulness, obsessive indecision (1.е.. “I often have trouble 
trying to make choices"), lack of self-confidence, and sensitivity to criticism (i.e., “I 
tend to be sensitive and easily hurt by thoughtless remarks and actions of others"). 


Aggression-Hostility (Agg-Host) 


About half of the 17 items of this scale reflect a readiness to express verbal aggres- 
sion (i.e., "It is natural for me to curse when I am mad"). Other items include rude, 
thoughtless or antisocial behavior (i.e., "If people annoy me I do not hesitate to tell 
them so"), vengefulness, spitefulness, a quick temper and impatience with others 


(i.e., "When people disagree with me I cannot help getting into an argument with 
them"). 
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Sociability (Sy) 


This scale has 17 items. One group of items describes a liking of big parties, inter- 
acting with many people (i.e., “I tend to start conversations at parties") and having 
many friends. The second group indicates an intolerance for social isolation in 
highly sociable subjects and a liking or tolerance for isolation in unsociable subjects 
(1.е., "I would not mind being socially isolated in some place for some period of 
time"). 


Activity (Act) 


The activity scale has 17 items. The first factor describes the need for general activ- 
ity and impatience and restlessness when there is nothing to do (i.e., "I like to keep 
busy all the time"). The second factor indicates a preference for challenging and 
hard work (i.e., "I like a challenging task more than a routine one") and lot of energy 
for work and other tasks (i.e., "When I do things I do them with lots of energy"). 


infrequency 


Infrequency (10 items). This is not a scale but it is used to eliminate subjects with 
possibly invalid records. The items are most true scored and if endorsed indicate 
exaggerated socially desirable content, unlikely to be true(i.e., “I never met a person 
I didn't like", "I have always told the truth"). Scores higher than 3 are considered to 
indicate questionable validity for that record. 


Norms 


T scores and percentile norms are available in an unpublished form from the authors 
based on 1.144 male and 1,825 female college students. Means, standard deviations, 
and frequency distributions are also provided. A copy of the ZKPQ is available from 
the authors. 


Short form (ZKPQ-S) 


A short form, consisting of 35 items (7 items for each of the 5 major factors) has 
been developed by Zuckerman and Kuhlman. We began by analyzing the items with 
the highest correlations with the total scores on each of the five factors and the 
greater response variance using the normative sample referred to above. On the basis 
of these data the seven highest correlating items for each factor were selected for the 
short form. Some items were eliminated because of redundancy of content and the 
next highest correlating items were substituted. 
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The short 35 item form was given to new samples of students, 208 males and 820 
females, from the same college population. A factor analysis of the 35 items among 
the females confirmed the assignments of every item to its particular scales based on 
the previous analyses. There were some differences in results among the males, per- 
haps due to the smaller N. A copy of this form with the college norms for it are 
available from the authors. 


Reliabilities 


Internal reliability (Alpha) 


Table 1 shows the internal reliabilities (Cronbach's alphas) for the total and sub- 
scales in the American sample and a Spanish translation (Zotes, 1999), and for 
translations of the total scales in Japanese (Shiomi, Kuhlman, David, Zuckerman, & 
Joireman, 1996), German (Ostendorf & Angleitner, 1994), Chinese, (Wu, Wang, 
Du, Li, Jiang, & Wang, 2000), and Catalan (Goma-Friexenet, 2000). The last sample 
was from the area of Spain around Barcelona where Catalan 15 the common lan- 
guage rather than Spanish. The reliabilities for the short-form described above are 
also shown in Table 1. 

АП of the alphas for males and females in the American sample are good with 
most ranging between .70 and .80. The only questionable one is for the subscale of 
General Activity in the female group (.56). The results were similar for the Spanish 
sample except for low reliabilities for the subscale of Work Effort. N-Anx has the 
highest reliabilities in all samples, above .79 for both males and females. The Japa- 
nese, German, and Catalan samples had good reliabilities for the total scales, but the 
Chinese sample had lower reliabilities; those for four of the scales were in .60's. 
Only the reliability for N-Anx was high (.81). 

The reliabilities of the short form of the ZKPQ were all satisfactory ranging from 
.62 to .78 for males and .67 to .73 for females. As expected from reduction in the 
length of scales, the reliabilities were somewhat less for the short than for the long 
form for four of the five scales. On Sy they are about the same. 


Retest reliabilities 


American students (N — 153) were tested twice on the ZKPQ with an interval of 
three to four weeks between tests. Retest reliabilities were: ImpSS, .80; N-Anx, .84; 
Agg-Host, .78; Act, .76, and Sy, .83. Retest reliabilities for males and females were 
quite similar. 
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Table 2. Four-factor analysis of NEO, ZKPQ, and EPQ personality scales 


SS ————————— Ме ННВ Не 
Factor Loadings 


Scale ~~ Factor 1 Factor 2 Factor 3 Factor 4 
NEO Extraversion .88 -.14 -.05 A7 
EPQ Extraversion .79 -.32 217) -.08 
ZKPQ Sociability .76 -.16 ‚10 -.07 
ZKPQ Activity .60 x ‚01 -.18 .02 
ZKPQ N-Anxiety -.13 .92 -.01 .08 
NEO Neuroticism -.15 .90 .10 -.11 
ЕРО Neuroticism -.16 .91 -.04 -.08 
МЕО Conscientious 15 -.07 -.86 -.02 
ЕРО Psychoticism -.09 -.08 .80 -.28 
ZKPQ ImpSS .48 .08 .74 -.02 
NEO Agreeabieness -.04 -.07 -.31 .81 
ZKPQ Agg-Host 235 .34 .24 -.72 
NEO Openness 2277, .14 .18 ‚67 


Note: From "A comparison of three structural models for personality: The Big Three, the Big Five, 
and the Alternative Five," by M. Zuckerman, D. M. Kuhlman, J. Joireman, P.Teta, and M. Kraft, 
1993, Journal of Personality and Social Psychology, 65, p. 762. Copyright 1993 by the American 
Psychological Association Reprinted with permission. 


Gender differences 


Significant gender differences were found in the normative American samples of 
college students. Men scored higher on ImpSS, Agg-Host, and Act: women were 
higher on N-Anx and Sy. On the short form men scored higher on ImpSS and 
women were higher on N-Anx. 


Validity 


Convergent and discriminant validity 


The ZKPQ, Costa and McCrae's (1992) NEO-PI-R, and the Eysenck Personality 
Questionnaire (EPQ-R; Eysenck, Eysenck, & Barrett, 1985) scales were intercorre- 
lated and subjected to a factor analysis in order to see the extent of overlap between 
the two five-factor models and Eysenck's three-factor one. The subjects were 157 
undergraduate students. Four factors accounted for 74 per cent of the variance and 
additional factors added little to the solution. Table 2 shows the results of the vari- 
max rotated four-factor analysis. 

The results show a high degree of convergence between the factors as represented 
by scales in each of the three tests. Measures of neuroticism were virtually similar in 
all three tests with loadings of .90 and above on factor 2. Convergence was also high 
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on the other factors: .76 - .88 for extraversion and sociability scales (factor 1); -.86, 
-74, and .80 for conscientiousness, impulsive sensation seeking, and psychoticism 
scales (factor 3); and .81 and -.72 respectively for the agreeableness and aggression- 
hostility scales (factor 4). The fifth factor in the NEO, openness to experience, 
loaded on factor 4 and the fifth factor in the ZKPQ, activity, loaded on the extraver- 
sion factor 1. A five-factor solution showed no convergence of openness and activity 
in the fifth factor. Discriminant validity was also good: most loadings on irrelevant 
factors were very low; one exception was the secondary loading of ImpSS on the 
extraversion factor 1. 

Bivariate correlations of the ZKPQ scales with the NEO, EPQ, EASI (Buss & 
Plomin, 1984), the SSS form V (Zuckerman, Eysenck, & Eysenck, 1978), and Ego- 
Control and Ego-Resilience scales (Block & Block, 1980) are shown in Table 3. The 
N for the correlations of the ZKPQ with the NEO and ЕРО was 157. The М'5 for the 
correlations with the other scales varied from 135 to 177, depending on the numbers 
of subjects who took particular tests. ZKPQ Sy correlated very highly with EPQ and 
NEO E scales, and with EASI Sociability. N-Anx correlated very highly with EPQ 
and NEO N scales, and with EASI Emotionality. ImpSS correlated highly with EPQ 
P, NEO Conscientiousness, with EASI Impulsivity, Block's Undercontrol, and the 
SSSV Total score. Correlations between ImpSS and the SSS subscales, with the ex- 
ception of Boredom Susceptibility (BS), were all moderate and close in magnitude. 
Agg-Host correlated highly and inversely with the NEO Agreeableness scale, and 
moderately with EASI Emotionality and Impulsivity, and lower with several other 
scales including NEO N and EPQ N and P. ZKPQ Act correlated highly with EASI 
Act and low with EPQ E and NEO E and Conscientiousness. NEO Openness did not 
correlate with any of the ZKPQ scales. ZKPQ Sy, N-Anx, ImpSS, and Act all show 
good convergent and discriminant validity. Agg-Host had good convergent validity 
with NEO Agreeableness. 

Cloninger's personality model resembles the alternative five model in many re- 
spects. Zuckerman and Cloninger (1996) correlated the scales of the ZKPQ with 
those of the Temperament and Character Inventory (TCI; Cloninger, Przybeck, 
Svrakie, & Wetzel, 1994). Four of the five ZKPQ scales showed good convergent 
validity with four of the TCI scales. ImpSS correlated .68 with TCI Novelty Seek- 
ing; N-Anx correlated .66 with Harm Avoidance; Agg-Host correlated -.60 with 
Cooperativeness; Act correlated .46 with Persistence. АП of these were markedly 
higher than correlations with other scales of the ZKPQ and TCI (discriminant valid- 
ity). A curious thing about the Cloninger model is that it has no scale for extraver- 
sion or sociability. The ZKPQ Sy scale correlated .37 with Novelty Seeking and -.38 
with Harm Avoidance. The EPQ Extraversion scale, also used in this study, showed 
the same pattern of correlations with the two TCI scales. 
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Table 3. Correlations of ZKPQ Scales with EASI, SSS and Block Ego Undercontrol and Ego Resilience 
Scales. 


з= å O -Á 


> 
5 5 2 
3 Е z 
Е < 2 
А Е z 
9 2 5 5 
ir 2 5 Я 2 
55 5 = Ф 5 
аб 9 3 ба 5 
ES Š > + < 
EPQ  Psychoticism 557 -.05 -.05 3225 -05 
ЕРО Extraversion .28** 170° -.24** 18 .36 
EPQ  Neuroticism .01 -.21* .79** 350 -.13 
МЕО  Conscientiousness -.51 -.04 -.09 -.13 see 
МЕО Extraversion .28'* Ot" -.24** .13 .36** 
МЕО Neuroticism ‚01 -.21 „79** #3577 -.13 
МЕО Agreeableness -.23“ -.05 .04 -.63** .09 
МЕО Ореппеѕѕ .00 .06 .00 -.14 et 
EASI Emotionality -.11 -.13 .68** .46** -.14 
EASI S Sociability 222955 .67** -.28** .24** 2755 
EASI Impulsivity ОМА 123 -.06 .42** -.03 
EASI Activity ‚08 197 -.18 sui .59** 
SSS Total Score .66** .20* -.10 31" ‚01 
SSS Thrill & Adventure .49** 22) -.24** aul. 2175 
SSS Experience Seeking .46** -.05 .00 207, -.12 
555  Disinhibition 48°" 22355 .05 .36** -.09 
SSS Boredom Susceptibility ae "i5 -.08 230 elt 
Block Ego Undercontrol .63** .24** -.15 „30 .11 
Block Ego Resiliency ‚14 ЦИЕ 90€ -.24** us 


Note: Correlations in bold-face are those showing good convergent validity with similar scales; p « 
.05, two-tailed test; ** p « .01, two-tailed test 


Motivational and emotional traits 


In a number of personality models it is suggested that personality traits represent 
expressions of more basic motivational, cognitive, or emotional traits. Gray (1982), 
for instance, has proposed two basic dimensions: anxiety and impulsivity. The for- 
mer is based on sensitivitiy to signals of punishment and the latter to sensitivity to 
signals of reward. A third system, fight-flight, is expressed in aggression or anger 
and based on sensitivity to signals of punishment or non-reward. Tellegen (1985) 
identifies two major dimensions associated with emotions: positive emotionality 
associated with extraversion and negative emotionality associated with neuroticism. 
His third dimension is constraint, associated with behavioral inhibition versus im- 
pulsivity. Costa and McCrae (1992) also identify extraversion with positive emo- 
tions and warmth, and neuroticism with anxiety, depression, and hostility, although 
they include other kinds of traits as facets of these two primary factors. Among 
Zuckerman's (1991) five basic factors are: sociability, identified with behavioral 
approach tendencies, generalized reward expectancy, and positive affect; and neu- 
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roticism, associated with a behavioral inhibition mechanism, generalized reward 
expectancy, and emotions of anxiety and depression. Impulsive sensation seeking 
was hypothesized to be associated with behavioral disinhibition and the emotion of 
anger and the attitude of hostility. 

These models, together with Eysenck's basic three factors, were related to meas- 
ures of motivational and emotional traits in a study by Zuckerman, Joireman, Kraft, 
and Kuhlman (1999). Motivational traits of sensitivities to reward and punishment 
were measured by scales developed by Torrubia er al. (1995) based on Gray's 
model. Generalized reward and punishment expectancy scales were those developed 
by Ball and Zuckerman (1990). Trait affect scales for anxiety, depression, hostility, 
positive affect, and sensation seeking (surgent) affect were assessed using the re- 
vised Multiple Affect Adjective Check List (MAACL-R; Zuckerman & Lubin, 
1985; Lubin & Zuckerman, 1999). 

Тће associations between personality, motivational, cognitive, and emotion traits 
were investigated using factor analysis with replication using Procrustes rotations to 
target. An extraversion factor (EPQ E, ZKPQ Sy) was associated with generalized 
reward expectancy, sensitivity to signals of reward, and both surgent and positive 
affect. The neuroticism factor (EPQ N, ZKPQ N-Anx) was strongly related to gen- 
eralized punishment expectancy, sensitivity to signals of punishment, and trait anxi- 
ety. The third factor, a combination of the EPQ P scale and the ZKPQ ImpSS and 
Agg-Host scales, was weakly related to sensitivity to reward and strongly related to 
trait hostility. Trait depression was equally related to the neuroticism and P-ImpSS- 
Agg-Host factors. Although there were some secondary loadings of the motiva- 
tional, cognitive, and emotion scales on the factors other than the primary ones de- 
scribed above, the general pattern supports the discriminant as well as the conver- 
gent validity of the measures. 


Construct, concurrent, and predictive validity 


Psychopathy 


Thornquist and Zuckerman (1995) rated prison inmates enrolled in a drug program 
for psychopathy using the Hare (1991) Psychopathy Check List. The participants 
were also evaluated on a passive-avoidance learning task developed by Newman and 
Kosson (1986). АП subjects were given the ZKPQ. Psychopathy ratings correlated 
with ImpSS in White but not in African-American or Hispanic groups. ImpSS cor- 
related with passive avoidance errors (learning not to respond to signals of punish- 
ment) across all subjects. None of the other scales in the ZKPQ were related to ei- 
ther the psychopathy rating, based on case history and interview, or the deficit in 
passive-avoidance learning in the experimental setting. 
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Drug Abuse 


Ball (1995) gave the ZKPQ to 450 cocaine abusers seeking treatment in an outpa- 
tient facility. Unlike the relative independence of the ZKPQ subscales in the college 
population, ImpSS, N-Anx, and Agg-Host subscales were substantially correlated in 
the substance abuser sample. The high intercorrelation of these scales may have 
something to do with their immediate circumstance of seeking treatment. Early use 
of cocaine was related to ImpSS, Agg-Host, and Act. Number of past treatment epi- 
sodes was related to N-Anx. Severity of drug abuse and addiction was related to 
ImpSS, N-Anx, and Agg-Host. Other psychiatric problems were also related to these 
three subscales. Only Agg-Host was related to a history of violence. 

Treatment outcome (predictive validity) was predicted by the ImpSS, Agg-Host, 
and N-Anx scales. Cocaine abusers who continued using cocaine during treatment 
had scored higher on ImpSS and Agg-Host on admission. There was also a marginal 
effect for Agg-Host. Early drop-outs scored higher on Agg-Host than those patients 
completing treatment. ImpSS was the only scale correlated with number of treatment 
appointments kept. Cocaine abusers who scored higher on ImpSS were less success- 
ful at remaining at least one month in treatment and were also judged in need of in- 
patient treatment. Cocaine abusers referred for treatment also scored higher on N- 
Anx and Agg-Host. 

Cluster analyses of the ZKPQ scales yielded two subtypes, one characterized by 
patients scoring higher on ImpSS, Agg-Host, and N-Anx and lower on Sy. This lat- 
ter subtype scored higher than the other subtype on drug-abuse, family, and psychi- 
atric severity. The other subtype was primarily men stipulated by the criminal justice 
system, not abused as children, and free of psychiatric symptoms, or in other words 
"normal" criminals. We would hypothesize that the subtype with elevated ZKPQ 
scores were personality disorders, primarily of the antisocial personality type. 

Black (1993) used the ZKPQ to study drug abusers entering an outpatient drug 
treatment facility. Based on their drug histories the clients were divided into primary 
users of alcohol, cocaine, or marijuana. Primary alcohol users were higher on Sy 
than cocaine users, whereas the primary cocaine users were significantly higher on 
N-Anx and Agg-Host. There were no significant differences between primary mari- 
juana users and the other two groups on any of the scales. Those who successfully 
completed the program were higher on Sy and those who were violators of the pro- 
hibition on drug use during the program were higher on N-Anx and Agg-Host. The 
results of this study were similar to those of Ball except for the absence of ImpSS as 
a correlate of cocaine use or a predictor of therapy outcome. 


Prostitution 


Studies of drug abusers in treatment programs confound long term personality char- 
acteristics with the stress of the program and the implicit demand to admit psycho- 
pathology. These situational features have their greatest effects on scales of neuroti- 
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cism which are elevated in the early parts of the programs but markedly reduced in 
those who remain in the program (Zuckerman, Sola, Masterson, & Angelone, 1975). 

We had the opportunity to test a group of prostitutes who were actively practicing 
their profession on a highway leading out of a city, soliciting motorists and truck 
drivers at bars and restaurants (Sullivan, Zuckerman, & Kraft, 1996). This was an 
unusually risk-taking group because a year before the study a serial killer had been 
murdering prostititutes along the same highway and some of the women knew the 
victims. More than half of the group were cocaine users. The prostitutes were inter- 
viewed and tested in a diner and paid $10 for their participation. Their ZKPQ scores 
were compared with those of a control group composed of food service workers at a 
university. Despite an attempt to match for race, marital status, age, and education 
the controls were still significantly older and educated. 

The prostitutes scored significantly higher than the controls on ImpSS, N-Anx, 
and Agg-Host, but after controlling for age and education differences only the dif- 
ference on ImpSS (p « .001) remained significant, although the difference on Agg- 
Host approached significance (p = .08). Cocaine users among the prostitutes scored 
significantly higher than non-drug or other drug users on ImpSS. ImpSS was also 
higher in polydrug than in no drug or one drug users. This relationship between sen- 
sation seeking and number of drugs used has been found in many other studies using 
the SSS (Zuckerman, 1994). 


General and specific risk-taking 


Zuckerman and Kuhlman (2000) studied the relationships between personality, us- 
ing the ZKPQ, and risk-taking among college students in six areas: smoking, drink- 
ing, drugs, risky sex, reckless driving, and gambling. Most of these types of risk- 
taking, particularly the first four, were significantly intercorrelated. A composite 
risk-taking score was constructed for each subject. High general risk-takers were 
higher than medium and low risk-takers on ImpSS, Agg-Host, and Sy. These three 
scales independently predicted general risk-taking. Gender differences on risk-taking 
were mediated by ImpSS only. Analyzing the types of risk-taking separately, we 
found that whereas all three of the above described traits were related to drinking, 
only ImpSS was independently related to smoking and drug use, only ImpSS and 
Agg-Host were related to risky sex, only Agg-Host and low N-Anx were related to 
reckless driving, and only Sy was related to gambling. It is interesting that in this 
general population N-Anx did not appear as a predictor of substance use. This sup- 
ports our belief that anxiety appears as a predictor of substance use only in groups 
asking for or actually in treatment. ImpSS is the main predictor of drug abuse and 
antisocial forms of behavior sometimes accompanied by elevated Agg-Host. 

The lack of association between ImpSS and reckless driving was not expected 
because of a large literature relating the SSS to driving violations, reported speed of 
driving, and even behavioral observations of reckless driving (Zuckerman, 1994). 
Although ImpSS correlated equally with Thrill and Adventure Seeking (TAS), Ex- 
perience Seeking, and Disinhibition subscales of the 555 (see Table 2) some as yet 
anecdotal evidence has suggested that the ImpSS scale is not picking up the TAS 
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component measured in the SSS. Some persons, above college age, who report en- 
gaging in many kinds of risky or extreme sports do not score high on ImpSS, and 
some who are actually aversive to any kind of physical risk-taking manage to get 
high scores on ImpSS. Studies of ImpSS scores in persons engaging in extreme 
sports have not been done. Perhaps the impulsive part of ImpSS is not characteristic 
of persons who take these kinds of physical risks, but may be limited to those who 
take other kinds of physical risk like smoking, drug use, or criminal activity. 


Gambling 


Sensation seeking, as defined by the SSS, has been related to gambling activity in 
the general population, although not to pathological gambling as represented by 
those who participate in Gamblers Anonymous or enter therapy for the compulsion. 
A recent community-wide study used only the ImpSS scale from the ZKPQ 
(McDaniel & Zuckerman, unpublished). Both gender and age were strong determi- 
nants of ImpSS scores; ImpSS was higher in men than in women and decreased with 
age in both sexes. ImpSS was significantly related to gambling interest and variety 
of gambling activities in both men and women. There were some differential rela- 
tionships between ImpSS and specific types of gambling in men and women. ImpSS 
correlated with sports betting and video poker playing in both sexes, but it correlated 
with slot machine playing and off-track betting only in women and with lottery 
playing only in men. 

Breen and Zuckerman (1999) used the ImpSS in a study of gambling behavior in 
a controlled laboratory paradigm. The outcomes of the betting of participants was 
fixed in a decreasing rate of winning over trials so that all participants started by 
winning at a high rate but the pay-off gradually decreased with each successive 
block of trials. Subjects could quit at any time. Those who persisted until they lost 
all of their initial stake were called “chasers” and those who quit before losing all of 
their starting money were termed “non-chasers.” The Imp component of the ImpSS 
differentiated chasers from nonchasers — the chasers were higher — but the SS 
component did not. The study illustrates why it may be important to look at the facet 
scores separately as well as at the total score. 


Team sport participants 


Sensation seeking, as measured by the SSS, has not been found to be high in partici- 
pants in ordinary sports or physical activities even if these are moderately risky, but 
it is high in participants of extreme sports like sky diving, scuba-diving, hang- 
gliding, mountain climbing, etc. We used the ZKPQ to investigate the personality 
profiles in male and female participants in team sports. Male members of baseball 
and football teams, and female members of field hockey and lacrosse and equestrian 
teams were given the ZKPQ. They were compared with general college norms from 
the school they attended (O' Sullivan, Zuckerman, & Kraft, 1998) 

Members of all of the teams were characterized by a distinctive profile on the 
ZKPQ. АП four teams were significantly higher on Activity and lower on the Neu- 
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roticism-Anxiety scales than the general college population. The fact that athletes of 
all types are high on Act supports the construct validity of this scale. The low scores 
on N-Anx may represent the lack of fear since physical harm is a risk in most of 
these sports. Members of the male teams actually scored lower than the general col- 
lege population on ImpSS. Although conventional sports participation has not been 
related to sensation seeking, the lower scores of male participants on ImpSS was not 
expected. The results, however, confirm our feeling that ImpSS is not relevant to the 
kind of physical risk-taking in ordinary sports. 


Final Comments 


The development of the ZKPQ began in the 1980's (Zuckerman et al., 1988) before 
Costa and McCrae (1992) had expanded their NEO from three to five factors to fit 
the popular big-five theory evolved from the lexical analyses of Goldberg (1990) 
and others (Norman. 1963). The definitions of the major five factors in the two 
models differ. and the content inclusion is somewhat narrower in the alternative-five 
model, but there is strong convergence of three of the five factors and moderate con- 
version on а fourth (ImpSS vs. Conscientiousness). The fifth factors are entirely 
different, Openness in the big-five and Activity in the alternative-five. The differ- 
ences among the first four factors are in the placement of facet traits in the NEO and 
the ZKPQ and in what is a facet, or minor trait, and what is a major trait. My inves- 
tigation of the realm of traits began as an attempt to see where sensation seeking fits 
in the broader family of traits and therefore we included several types of sensation 
seeking as well as impulsivity scales in our initial factor analyses. The reasons for 
sampling more widely among sensation seeking and impulsivity scales was that 
these constructs had proven to be quite important in psychobiological research and 
we were striving to establish a framework for a psychobiological model of personal- 
ity (Zuckerman, 1991). Similarly, traits like aggression and activity were selected 
because of their importance in the biological and comparative literature. 

Reliability findings for the scales are fairly robust even though we can identify 
subfactors in three of the five major scales. Convergence and discriminant validities 
are also strong. Research shows good concurrent and predictive validity in the areas 
of psychopathy, drug abuse, and risk-taking in general. Translated scales in German, 
Spanish, Catalan, Japanese, and Chinese have shown good factor reliabilities and 
internal scale reliabilites suggesting cross-cultural generality of the personality 
contructs. We hope that the easy availability of the ZKPQ gratis to all interested 
researchers will continue to stimulate research with the instrument, particularly in 
the areas of genetics, psychopharmacology, psychophysiology, and psychopathol- 
ogy. This particular five-factor model is based on an evolving psychobiological 
model but much more research is needed to develop the model to its fullest poten- 
tial. 
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Second-order factor structure of the Cattell 
Sixteen Personality Factor Questionnaire 


Scott M. Hofer 
Herbert W. Eber 


Introduction 


The Cattell Sixteen Personality Factor (16PF) Questionnaire? has been one of the 
most studied instruments in the history of personality research. A conservative esti- 
mate of research using the 16PF Questionnaire would include upwards of 2100 pu- 
blications since 1974 (see IPAT, 1991 for 1974-1991 references). The 16PF Questi- 
onnaire (Cattell, 1949) has undergone four revisions, in 1956, 1962, 1967-1969 
(Cattell, Eber, & Tatsuoka, 1970), and in 1988-1993 resulting in the current, Fifth 
Edition of the Sixteen Personality Factor Questionnaire (Cattell, Cattell, & Cattell, 
1993; Conn & Rieke, 1994; Russell & Karol, 1994). 


Theoretical and historical rationale 


In the early 1940's, Cattell (1943, 1945) began a vigorous program of research into 
the structure of personality, one that was based on factor analysis of what he termed 
the "personality sphere"— a complete range of trait-variables that have been defined 
in language. Given the limits of performing factor analysis at that time, it was neces- 
sary to reduce the number of variables to a smaller number of clusters on which to 
base an empirical analysis of personality (for a historical review, see H.E.P. Cattell, 
1996; John, Angleitner, & Ostendorf, 1988). Subsequent factor and cluster analysis 
led to continued refinement of the primary scales over his career, work that conti- 
nues at the Institute for Personality and Ability Testing (Cattell er al., 1993; Conn & 
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Rieke, 1994). 

Development of the 16PF Questionnaire, from the beginning, has included infor- 
mation from peer or observer ratings, self-reports, and objective behavioral data. 
Cattell's approach was multivariate in all forms — consistent personality attributes 
should be observable by others, reported in questionnaire format, and manifest in an 
individual's behavior. Replication across these three modes was considered to lead 
to source traits defining significant components of personality. Consistent structures 
across methods were an integral aspect within Cattell's development scheme. 

Cattell conceptualized personality in terms of a hierarchical factor structure. He 
emphasized the likelihood that real influences would, in general, be correlated and 
thus he eschewed orthogonal factor solutions. Simple structure was to be the ulti- 
mate guide, based on the obvious logic that, within any system of multiple causes, 
any one behavior was most likely influenced by far fewer than the total set of poten- 
tial causes. That, in turn, demanded zeroes (or near-zeroes) in the factor pattern ma- 
trix. Cattell developed ingenious methods, not always totally objective, for rotating 
to this simple structure. 

Given that primary factors were permitted to be correlated, the hierarchical notion 
of second (or higher) order factors becomes obvious. Although the emphasis was at 
the primary level of personality structure, Cattell and colleagues reported second- 
order global factors of personality based on these primary factors. Indeed, researches 
that have led to an emphasis on the five broad factors of personality were initially 
based on Cattell's 35 variable set which formed the basis for the 16PF (Norman, 
1963; Tupes & Christal, 1961; see also H.E.P. Cattell, 1996). Extensive reviews of 
the historical achievements in personality research, including Cattell’ s contributions, 
may be found elsewhere (H.E.P. Cattell, 1996; John er al., 1988). 


Primary factor structure of the 16PF questionnaire 


The 16PF Questionnaire consists of fifteen personality scales and a brief reasoning 
scale. The primary structure has been satisfactorily replicated in studies based on 
samples differing in language, culture, and education that ensured sufficient variabi- 
lity (see Mershon & Gorsuch, 1988 for a review; e.g.. Cattell, 1946; 1947; 1956b; 
1973; Cattell er aL, 1970; Cattell & Krug, 1986; Howarth & Browne, 1972). The 
16PF primary scales are shown in Table 1. 


Second-order factor structure of the 16PF Questionnaire 


At the second-strata, at least five broad personality factors have been identified with 
considerable confidence across diverse samples of subjects (e.g., Bolton, 1977; 
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Table 1. The 16PF primary scales 
a € ERE I E C 


Factor name and label 


Descriptors of high range 


Descriptors of low range 


Warmth А Warm, Outgoing, Attentive to Reserved, Impersonal, Distant 
Others 

Reasoning B X Abstract Concrete 

Emotional Stability C Emotionally Stable, Adaptive, Reactive, Emotionally Changea- 
Mature ble 

Dominance E Dominant, Forceful, Assertive Deferential, Cooperative, Avoids 

Conflict 

Liveliness F Lively, Animated, Spontaneous Serious, Restrained, Careful 

Rule-Consciousness С Rule-Conscious, Dutiful Expedient, Nonconforming 

Social Boldness H Socially Bold, Venturesome, Shy, Threat-Sensitive, Timid 
Thick-Skinned 

Sensitivity | Sensitive, Aesthetic, Sentimen- _ Utilitarian, Objective, Unsenti- 
tal mental 

Vigilance | Vigilant, Suspicious, Skeptical, Trusting, Unsuspecting, Accep- 
Wary ting 

Abstactedness M Abstracted, Imaginative, Idea- Grounded, Practical, Solution- 
Oriented Oriented 

Privateness N Private, Discreet, Non- Forthright, Genuine, Artless 
Disclosing 

Apprehension O Apprehensive, Self-Doubting, Self-Assured, Unworried, Com- 
Worried placent 

Openness to Change Q1 Open to Change, Experimenting Traditional, Attached to Familiar 

Self-Reliance Q2  Self-Reliant, Solitary, Individu- ^ Group-Oriented, Affiliative 
alistic 

Perfectionism Q3  Perfectionistic, Organized, Tolerates Disorder, Unexacting, 
Self-Disciplined Flexible 

Tension Q4  Tense, High Energy, Impatient, Relaxed, Placid, Patient 


Driven 


Note: Adapted from the 16PF Fifth Edition Technical Manual (Conn & Rieke, 1994; Table 1.5) 


with permission from the publisher. 


Cattell, 19562; 1956b; Cattell & Cattell, 1995; Cattell et al., 1970; Gerbing & Тиеу, 
1991; Gorsuch & Cattell, 1967; Hofer, Horn, & Eber, 1997; Horn, 1963; Karson, 
1961; Karson & Pool, 1958; Krug & Johns, 1986; Matthews, 1989). These second- 
order factors account for much of the reliable covariance among the primary factors. 
For example, in the 16PF Fifth Edition Questionnaire, a six-factor solution (inclu- 
ding an intelligence factor indicated by the B primary scale) accounts for 70 per cent 
of the total variance of the 16PF primary scales (Conn & Rieke, 1994). The brief 
reasoning scale (B primary) indicates a separate and sixth factor. The results of ma- 
ny studies have shown that intellectual ability factors of this kind are separate from, 
although some are correlated with, self-report dimensions of personality. The five 
major second-order personality factors and the significant primary scales that indi- 
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cate them are shown in Table 2. However, historically, the diversity of factor analy- 
tic techniques and use of different mathematical rotations (e.g., orthogonal versus 
oblique) to evaluate the factor structure across diverse samples has made the eviden- 
ce for this second-order structure somewhat mixed (see Cattell, H.E.P., 1995; Cattell 
& Cattell, 1995; Chernyshenko, Stark, & Chan, 2001; Hofer et al., 1997; Horn, 
1963). In a large-scale reanalysis of the national standardization sample of the 
Fourth Edition of the 16PF Questionnaire and Clinical Analysis Questionnaire 
(CAQ), Krug and Johns (1986) found that seven major personality dimensions ac- 
count for most of the variance of the 16PF primary personality scales. The five ma- 
jor factors that closely resemble the Big-Five factors are Extraversion, Anxiety, 
Tough-Mindedness, Independence, and Self-Control. 

The results of Krug and Johns (1986) differed only slightly from an earlier analy- 
sis of second-order factors reported by Cattell et al. (1970). Additionally, the five 
major second-order factors were sufficiently replicated in a study by Noller, Law, 
and Comrey (1987) and reanalyzed by Boyle (1989) where the 16PF Questionnaire 
was analyzed with the Eysenck Personality Questionnaire and the Comrey Persona- 
lity Scale. However, several studies of the second-order structure of the Fourth and 
earlier versions of the 16PF Questionnaire report divergent findings on the number 
of factors. Gorsuch and Cattell (1967) and Cattell (1994) extracted eight factors. 


Table 2. Global factor dimensions of the Sixteen Personality Factor Questionnaire 


a 
Ф 
5 " 
E $ Е 5 
5 = 5 Б 
Ф > = С c 
> ~ о o 
© © E: о. y 
E x a s © 
Factor Мате Label + = £ a 
Warmth A .74 .35 
Reasoning B 
Emotion. Stabil. С -.70 
Dominance E .87 
Liveliness F .70 -.39 
Rule-Conscious G .78 
Social Boldness H .44 .43 
Sensitivity | -.75 
Vigilance L 57 se] 
Abstactedness M -.39 -.58 
Privateness N -.67 
Apprehension О 76 
Open.to Change 01 -.68 .49 
Self-Reliance Q2 -,81 
Perfectionism Q3 82 
Tension Q4 .B6 


Ан ________________________________ 
Note: Rotated factor loadings (decimals and loadings « .30 omitted) based on the national 
standardization sample (N-3,498). Adapted from the 16PF Fifth Edition Technical Manual (Conn & 
Rieke, 1994; Table 1.3) with permission from the publisher. 
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Argentero (1989) extracted eight factors with orthogonal rotation on an Italian versi- 
on of the 16PF with the finding that five factors were found to be robust across men 
and women: Extraversion, Anxiety, Control, Tough-Mindedness, and Intelligence 
while the Independence factor was not identified. A similar second-order factor 
structure of the 16PF Questionnaire across men and women were reported by 
Karson and O'Dell (1974). 

Hofer et al. (1997) reported results from an analysis of factorial invariance of the 
second-order structure of two forms of the 16PF Questionnaire across large samples 
of police applicants and convicted felons. Evidence for a five-factor second-order 
structure, based on the primary scales and excluding the ability factor, was obtained 
across diverse samples and forms. The factor structure was largely congruent with 
the findings of Boyle (1989; see also Noller et al., 1987), Krug and Johns (1986), 
and Cattell et al. (1970). The tests of factorial invariance showed that constraints of 
strict factorial invariance (equivalent factor loadings, variable means, and variable 
uniquenesses) as well as substantive model constraints of invariant factor intercor- 
relations and variances provided a reasonable fit across samples within the major 
groups of police applicants and felons. Chernyshenko ег al. (2001) report clear fin- 
dings for sixteen primary factors and five second-order factors based on hierarchical 
factor analysis (i.e., Schmid-Leiman procedure) of multiple-item composites from a 
large sample of respondents (N = 11,846). These studies provide strong evidence in 
support of a five personality factor structure of the 16PF Questionnaire that closely 
approximates the Big Five factor pattern. 

Several recent studies have examined the factor-level correlations among the 
_16PF global factors and broad factors from questionnaires used to indicate the Big 
Five (e.g., Barbaranelli & Caprara, 1996; Boyle, 1989; H.E.P. Cattell, 1995; 1996; 
Noller er al., 1987). The Fifth Edition 16PF Technical Manual contains comparisons 
of the 16PF primary and global factors with other scales, including the NEO-PI-R, 
shown in Table 3. The global factors for the two tests are highly congruent, with 
many of the NEO PI-R facet scales having their highest association with the corres- 
ponding 16PF Global scale. H.E.P. Cattell (1996) reported comparisons across the 
16PF Fifth Edition and the NEO-PI-R and found a high degree of concordance 
across the five broad factors but also important differences in the conceptualization 
of these factors. We would expect no less; strict concordance despite different me- 
thods is not yet always a realistic expectation in our science. 

Extraversion and Anxiety (Neuroticism or, conversely, Adjustment) have been 
well-identified across different questionnaires. It almost could not be otherwise. 
These two broad factors are so pervasive that any personality data in which they do 
not appear should be suspect as to data errors. Self-Control (Conscientiousness) 
shows a high degree of concordance across questionnaires. The Independence 
(Agreeableness) and Tough-Mindedness (Openness to Experience) factors exhibit 
the least correspondence across questionnaires (e.g., H.E.P. Cattell, 1993; 1996). 
While there are clearly nuances in how each of these factors are defined across diffe- 
rent broad-factor systems, it is clear that each represents a high degree of similarity 


conceptually. 
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Table 3. Correlations between global factor dimensions of the Sixteen Personality Factor Question- 
naire and the NEO PI-R Scales. 


: 16PF Global Factor Dimension 


л 
б 
© Ф 
o 
$ E 5 5 
Р = Е Ё 
о 2 1 [7] o 
A Ф a о. 9 
5 `х 3 £ = 
ki + E Е 5 
МЕО PI-R Factor Scales 
Extraversion .65 -.21 .36 -.29 
Neuroticism -.31 „75 2227 
Openness ‚56 -.25 
Agreeableness .28 -.42 
Conscientiousness -.21 .29 ‚66 
NEO-PI-R Facet Scales 
E1: Warmth .61 -.24 .20 
E2: Gregariousness .70 -.23 52 -.23 
ЕЗ: Assertiveness .45 -.26 .60 
E4: Activity .21 .40 
E5: Excitement Seeking .39 225 -.25 
E6: Positive Emotion .47 -.29 „77 -.22 
N1: Anxiety -.21 .63 -.23 
N2: Angry Hostility .59 
N3: Depression -.28 ‚66 -.22 
N4: Self-Consciousness -.31 .55 -.44 
N5: Impulsiveness .30 232) 
N6: Vulnerability -.22 251 -.28 
O1: Fantasy .26 -.41 -.35 
О2: Aesthetics .24 -.53 
03: Feelings .24 -.37 
O4: Actions -.21 -.31 -.32 
05: Ideas .24 
O6: Values -.33 .20 -.24 
A1: Trust .38 -.47 
A2: Straightforwardness -.31 .20 
A3: Altruism 232 -.22 
A4: Compliance -.23 -.44 
A5: Modesty -.34 
A6: Tender-Mindedness .24 -.26 
C1: Competence .22 .39 
C2: Order .28 .57 
C3: Dutifulness .42 
C4: Achiev. Striving .23 .44 
C5: Self-Discipline .21 .44 
C6: Deliberation cu .23 .57 


Note: Correlations below .20 and decimals omitted. (N=257). Adapted from the 16PF Fifth Edition 


Technical Manual (Conn & Rieke, 1994; Table 6.1 and Appendix 6D) with permission from the 
publisher. 
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Description of the Sixteen Personality Factor Questionnaire 


The Fifth Edition of the Sixteen Personality Factor Questionnaire (Cattell ег al., 
1993; Conn & Rieke, 1994; Russell & Karol, 1994) is comprised of 185 items and 
provides scores on 16 primary factor scales, five second-order factors, and an im- 
pression management scale. One of the primary scales is a brief reasoning scale 
(16PF Scale B) involving verbal analogies (both common and esoteric). Each prima- 
ry factor scale contains 10-15 items, with each item scored on a three-choice respon- 
se format. The b response choice for all personality items (except for the Reasoning 
[B] scale) appears as a question mark. 

The 16PF Questionnaire is designed to be administered to individuals aged 16 
years of age and older and has been evaluated on diverse samples of individuals, 
nationally and internationally, in clinical, occupational, and other settings. Alternate 
forms of the 16PF have been developed for particular populations or situations. The- 
se include part of the Clinical Analysis Questionnaire (Krug, 1980), a Form E (Eber 
& Cattell, 1976) useable with adults down to 3rd grade reading level, and even some 
tape recorded presentations to permit testing of virtual illiterates. Forms for younger 
ages, not always covering all the factors because they sometimes were not clearly 
identifiable, included the Jr.-Sr. High School Personality Questionnaire (Cattell, 
Beloff, & Coan, 1958; Cattell, Cattell, & Johns, 1990), the Child Personality Ques- 
tionnaire (Porter & Cattell, 1963), and even an Early School Personality Question- 
naire and a Pre-School Personality Questionnaire (Cattell & Coan, 1973). A new 
Adolescent Personality Questionnaire (Schuerger, 2001) is in press. 

The 16PF Questionnaire can be self-administered individually or in group format 
using either computer-based or paper-and-pencil formats (permitting either hand or 
computer scoring). Test-completion time ranges from 35-50 and 25-35 minutes for 
the paper-and-pencil and computer administration, respectively. The questionnaire is 
designed for administration to individuals with at least a fifth-grade reading profici- 
ency level. 

Normative data for the 16PF Fifth Edition is based on a population stratified 
sample of 2,500 individuals that closely corresponds to gender, race, age, and edu- 
cation percentages of the 1990 U.S. census. These data were provided from expe- 
rienced 16PF administrators in a variety of settings and who were provided testing 
materials free in exchange for the normative data. Sten ("standardized ten") scores — 
having a mean of 5.5, standard deviation of 2.0 and ranging from 1-10 — were com- 
puted for each scale to provide a basis for comparison across primary scales with 
norms based on both combined-sex and sex-specific samples. 
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Reliability 


Internal consistency coefficients for the 16PF Fifth Edition primary scales averaged 
75 (.66 - .86) across two general population samples and one university student 
sample (Conn & Rieke, 1994). It was not possible to compute internal consistency 
coefficients for the Global Scales since they were derived from weighted composites 
of the primary scales. In two independent university samples, the median test-retest 
reliability coefficients for the second-order scales were .87 (.84 - .91) and .80 (.70 - 
.82), for two-week and two-month retest intervals, respectively (Conn & Rieke, 
1994). 


Validity 


Construct Validity 


Various versions of the 16PF have been compared to questionnaires designed to 
measure the Big Five factor structure. These studies provide evidence for a high 
degree of correspondence between the 16PF and other questionnaires designed to 
measure personality at the broad factor level (H.E.P. Cattell, 1996; Conn & Rieke, 
1994; Gerbing & Tuley, 1991). For example, comparison of the 16PF Fifth Edition 
with NEO-PI-R (H.E.P. Cattell, 1996) resulted in moderate to high correlations bet- 
ween corresponding broad factors. Further inspection of the primary scale loadings 
on these factors across the two questionnaires, however, finds different emphases at 
the broad factor level and should be an important issue for further development of 
both primary and secondary factors within personality taxonomies (H.E.P. Cattell, 
1996). 


Criterion Validity 


A substantial body of validity data has been obtained for concepts at the level of the 
16 factors of the Cattellian system (е.2.. Cattell ег al., 1970; Cattell & Krug, 1986). 
Mershon and Gorsuch (1988) investigated the criterion validity of the 16PF in terms 
of whether the 16 primary factors or the fewer second-order factors account equally 
for the variance in a criterion variable measuring "aggregated behavior" (job tenure 
or supervisors ratings) with shrunken r's computed to reduce bias associated diffe- 
rent numbers of predictors. The 16 primary scales accounted for twice the amount of 
variance in the criterion variables than did the second-order factors. These findings 
further support the idea that fine-grained personality distinctions may have greater 
utility for many purposes than broad factors (e.g., Goldberg, 1972). In a recent stu- 
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dy, Goldberg (in press) compared several personality inventories and their Internati- 
onal Personality Item Pool equivalents (IPIP; see Goldberg, 1999) in a predictive 
validity study of six clusters of behavioral acts varying in social desirability. Al- 
though all questionnaires showed a high degree of predictive validity, the 16PF 
Questionnaire, particularly the IPIP version of the 16PF, was found to have the high- 
est validity coefficients. 


Cross-Cultural Generalizability 


The 16PF Questionnaire has been translated into over 40 languages and validated in 
numerous countries worldwide. The primary structure of the 16PF Questionnaire has 
been evaluated across cultures and has resulted in confirmation of all or most of the 
16 factors (e.g., Adcock & Adcock, 1977; Cattell & Nesselroade, 1965; Cattell, 
Pichot, & Rennes, 1961; Krug, 1971; Meschieri & Cattell, 1960; Motegi, 1982; 
Schneewind, 1977: Tsujioka & Cattell, 1965). Generalization across boundaries of 
language, of custom, of ethnicity and of geography involve whole new classes of 
problems which fall outside the present scope. In general, difficulties and incon- 
sistencies multiply with increased specificity. Broad concepts typically transport 
well, both in research and in application. Narrower focus translates less well. 


Summary 


Analyses based initially on items comprising the 16PF Questionnaire (or adjective 
scales which Cattell developed and on which the 16PF Questionnaire was based) as 
well as on subsequently developed items have led to a five factor theory of persona- 
lity known popularly as the big five (see Block, 1995, for a review; also Fiske, 1949; 
Goldberg, 1990; McCrae & Costa, 1985). The view that five factors — extracted at 
the primary factor level — accounts for a significant proportion of individual differ- 
ences in self-report data of questionnaires is supported by numerous studies and re- 
views (e.g., De Raad, 1998; Digman, 1990; Goldberg, 1981; Norman, 1963; Saucier 
& Goldberg, 1996; Tupes & Christal, 1961). 

It is the case that the global factors extracted at the second-order level of the 
16PF Questionnaire are highly similar to factors known as the Big Five. Science has 
been advanced by the fact that there is much agreement at the broad factor level 
among the NEO-PI-R (Costa & McCrae, 1992), the Goldberg Big Five (Goldberg, 
1990; 1992), and other personality instruments. Disagreements in the definition of 
these broad factor concepts may be a matter of emphasis or may define the next ge- 
neration of problems for personality psychologists. Is there a reason that we should 
not have six factors, or seven, or more that describe major features of personality? 
Cattell regarded the primary factors of personality to be representative of real causes 
— solutions that mixed primary and secondary levels of such factors were regarded 
as incorrect. However, from other viewpoints, mixing levels of analysis may not be 
so negative. Nonetheless, the critical issue is that the developing stabilities of perso- 


406 Big Five Assessment 


nality assessment in terms of broad factor structures are well-represented by the 
16PF as well as by other questionnaires. 
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Introduction 


In this chapter, we describe the development of a Five Factor scoring system for the 
300-item Adjective Check List. Designed primarily for use in research studies rather 
than for individual personality assessment, the system permits the scoring of any 
selected set of adjective descriptors in terms of the Big Five. We employed ratings 
made by American university students to determine the degree to which each of the 
300 person-descriptive adjectives of Gough and Heilbrun's (1980) Adjective Check 
List was associated with each of the five factors described by the Five Factor Model: 
Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness 
to Experience. These ratings were found to be highly reliable (.97 to .98) and to have 
a high degree of convergent validity with the results of earlier ACL studies by John 
(1989) and McCrae and Costa (1992). 

The scoring system provides a mean score for each of the five factors for any gi- 
ven sub-set of the 300 adjectives, i.e., those chosen as descriptive of a given target. 
Illustrative applications include the cross-cultural examination of gender stereotypes 
in 27 countries (Williams, Satterwhite, & Best, 1999; Williams, Satterwhite, Best, & 
Inman, 2001) and a 20-country study of cross-cultural similarities and differences in 
the relative importance of various psychological traits (Williams, Satterwhite, & 
Saiz, 1998). There are many potential research applications involving the use of the 
system to obtain Big Five profiles for individuals or groups, real or hypothetical, or 
any other "target" that can be meaningfully personified and administration typically 
takes 15 minutes or less. Information is provided concerning computer scoring 
systems and the availability of translations of the ACL item pool to languages other 


than English. 
Big Five Assessment, edited by B. De Raad & M. Perugini. © 2002, Hogrefe & Huber Publishers. 
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Development of the ACL item pool 


The Adjective Check list (ACL) is a set of 300 person-descriptive adjectives develo- 
ped by Harrison Gough and his associates at the University of California, Berkeley 
(Gough & Heilbrun, 1980). The ACL is used to record the psychological characte- 
ristics associated with individual persons or groups and has been employed in a wide 
variety of assessment and research contexts. This chapter reports on the develop- 
ment of a system for scoring ACL item sets in terms of the Five Factor Model 
(FFM) of personality. The primary purpose of this system was to enable the study of 
the characteristics associated with groups in terms of the FFM (e.g., gender stereo- 
types) rather than for individual personality assessment, for which excellent instru- 
ments already existed. Before proceeding to a description of the new five-factor 
system — designated ACL-FF — we will review the history of the ACL and briefly 
describe three other theory-based scoring systems that have been used with ACL 
item sets. 

The origin and development of the Adjective Check List item pool has been de- 
scribed in detail by Gough and Heilbrun (1980). It was initially proposed as a me- 
thod of obtaining observers’ descriptions of other individuals (i.e., staff members’ 
observations of individuals studied in assessment programs). However, it was quic- 
kly observed that the item set could be used in self-descriptions and has been so 
employed quite extensively. The ACL item pool has also been used to characterize 
one's ideal self, a fictitious individual or persona, geographical regions, and other 
"targets" that can be quite easily personified. Inasmuch as language, particularly 
adjectives, is used to describe and specify, the Adjective Check List is then rooted in 
language itself and must therefore be universally applicable for descriptive purposes. 

The first attempts at categorizing such descriptive terms was undertaken by АП- 
port and Odbert in their 1938 monograph in which they enumerated 17, 953 English 
words. This list was condensed by R. B. Cattell (1943, 1946) who developed a trait 
list of 171 variables, obtained ratings of the items from subject samples, and factor 
analyzed the results, reducing the surface clusters to twelve "primary source traits of 
personality." Initial attempts to develop the Adjective Check List, made in 1949, 
drew 125 adjectives from Cattell's 171 variables and other items were added follo- 
wing review of the theoretical viewpoints of Freud, Jung, Mead, and Murray. For 
instance, stingy was added to reflect Freud's concept of the anal character, rational 
was added to reflect the Jungian rational functions (thinking and feeling), adaptable 
reflects Mead's concept of skill in role-taking, and understanding was taken from 
Murray's concept of needs. Following review of the instrument in 1950, by the 
Institute of Personality Assessment and Research in Berkeley, it was determined that 
some important terms had not been included and thus several changes were made to 
alleviate lack of items descriptive of physical characteristics (i.e., attractive, good- 
looking, and handsome) and words representing reactions of males to females (i.e., 
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charming, fickle, flirtatious, and sexy) among other things. The ACL existed in its 
current 300-item form by the end of 1952. 

The 300 ACL items provide for a relatively comprehensive description of the tar- 
get being considered. The large size of the item pool permits the inclusion of many 
nearly synonymous adjectives with subtle differences in meanings: e.g., steady, 
stable, unemotional, unexcitable. Despite the large number of items, persons using 
the ACL usually take no more than 20 minutes to complete the description of a 
given target. The 300 English language ACL items are presented in the Appendix at 
the end of this chapter. 


Translations of the ACL items 


The 300 English language items have been translated into more than 20 of the 
world's major languages. Williams and Best and their associates employed translati- 
ons from English to 16 other languages, namely: Bahasa-Malaysia, Chinese, Dutch, 
Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Polish, 
Portuguese, Spanish, Turkish, and Urdu. Their studies have resulted in the 300 items 
being scaled for (1) relative association with women and men (gender stereotypes) 
in 27 countries (Williams & Best, 19902), (2) relative association with young adults 
and old adults (age stereotypes) in 19 countries (Williams, 1993), (3) psychological 
importance (1.е., central vs. peripheral traits) in 20 countries (Williams et al., 1998), 
and (4) favorability (positive vs. negative characteristics) in 10 countries (Williams 
ет al., 1998). 

The use of translated materials in psychological research is always somewhat 
problematic. Richard Brislin (1980), an authority on this topic, notes that the trans- 
lation of individual words is more difficult than the translation of sentences or para- 
graphs that suggests that our translators faced a most challenging task. On the other 
hand, Brislin notes the value of redundancy in translated materials and this was a 
positive feature in making our translations. While each of the 300 English adjectives 
has at least a slightly different meaning, there are many near synonyms in the item 
pool. For example, it seems clear that the adjectives stable, steady, unemotional, and 
unexcitable share a substantial common meaning factor. If, for some reason, one 
item is not well translated, one can hope that the others will be and in this way the 
common meaning factor will be represented in the translated item pool. Thus, while 
one must be very cautious in making cross-translation comparisons of responses to 
individual items, one seems on safer ground in making such comparisons between 
broad factor scores that are based on responses to many items. 

We have no formal basis for judging the adequacy of the translation of the item 
pool from English to the other languages. We know that the translations were done 
with care by our cooperating researchers who employed recommended methods 
such as back translation and committee approaches. 

We do have one set of findings that bears indirectly on the question of the fidelity 
of the ACL translations. Denotative meaning aside, a critical aspect of translation 
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fidelity concerns affective meaning, particularly the evaluative connotations of 
words that Osgood, May, and Miron (1975) found to be the principal affective mea- 
ning component in each of the diverse sample of world languages they studied. 

Our study (Williams er al., 1998) employed samples of university students who 
rated the favorability of each ACL item, or its translated equivalent, on a five-point 
scale. The students were from the United States, Nigeria, Singapore, Chile, China, 
Korea, Norway, Pakistan, Portugal, and Turkey. Subjects in the first three countries 
rated the favorability of the items in the standard English language form, while 
subjects in the other seven countries rated the items as translated into their respective 
national languages. After computing mean favorability ratings for each of the items 
in each sample, a correlation coefficient was computed between the mean ratings in 
each pair of countries, across the 300 items. The results were then grouped by lan- 
guage of administration with the following results: among the three English langua- 
ge samples, the median is .82; among the seven other languages, the median is .82. 

These findings indicated a high degree of agreement in the favorability ratings 
across the eight languages employed, despite the likelihood of at least some bonafi- 
de cultural differences in the favorability associated with particular psychological 
traits. The results support the idea of reasonable translation fidelity with regard to 
the important affective meaning dimension of favorability. 

In sum, we feel that our language translations, everything considered, are reaso- 
nably adequate for the uses to which they have been, and may be, employed. 


Earlier scoring systems 


The use of the ACL often results in the selection of 80 - 100 items as descriptive of a 
given target person or group. While analyses may be conducted at the level of indi- 
vidual items (e.g., Williams & Best, 1990а, chapter 3; Williams er al., 1998, chapter 
5), it is often more useful to employ scoring systems that abstract or summarize the 
factors underlying the responses to the individual items. Prior to the development of 
the Five Factor system described below, there were three major theoretically based 
scoring systems available dealing, respectively, with psychological needs, affective 
meanings, and ego states. 


Psychological needs 


The original ACL scoring system (Gough & Heilbrun, 1980) yields scores indicating 
relative loading on 15 psychological needs (e.g., Dominance, Deference, Nurturan- 
ce, Achievement, etc.) for selected ACL item sets. This system was developed by 
providing psychology graduate students with definitions taken from Edwards (1959) 
and having them select adjectives considered indicative or counter-indicative of each 
of the 15 needs. Consensus among the raters was used to code each of the 300 ad- 
jectives for each of the psychological needs. 
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Affective meanings 


Based on the three-factor theory of affective meaning developed by Charles Osgood 
and his associates (Osgood er al., 1957, 1975), this system enables one to obtain 
scores reflecting the relative Favorability, Strength, and Activity for selected ACL 
item sets (Best, Williams, & Briggs, 1980; Williams & Best, 1977). In developing 
the system, American university students rated each of the 300 ACL adjectives for 
it’s favorability, strength, or activity with a separate group of judges employed for 
each factor. The mean values obtained in this manner provide a score on each of the 
three factors for each of the 300 items. 


Transactional Analysis Ego States 


Based on the Transactional Analysis Ego States theoretical system of Eric Berne 
(1961. 1966), this system provides scores for each ACL adjective that reflect its 
“loading” on each of the five functional ego states of Transactional Analysis (TA) 
theory: Critical Parent, Nurturing Parent, Adult, Free Child, and Adapted Child 
(Williams & Williams, 1980). The ego state scores were based on the mean ratings 
of the 300 items by 15 expert judges who were highly trained in TA theory. The 
system enables one to compute mean scores reflecting the relative loading on the 
five ego states for any given set of ACL items. 

The theory-based scoring systems just described have been found useful in a va- 
riety of studies in the personality-social area including: gender stereotypes (Williams 
& Best, 19902). age stereotypes (Williams, 1993), self and ideal self (Williams & 
Best, 1990b), and the importance of psychological traits (Williams et al., 1998). The 
Five Factor scoring svstem for the ACL was constructed to enable such research 
findings to be expressed in terms of this important, more recently developed, con- 
ceptual system. 


Development of the ACL-FF system 


Here we provide a general description of the development of the ACL-FF scoring 
system. Additional details may be found in FormyDuval (1993) and in FormyDuval, 
Williams, Patterson, and Fogle (1995). In these earlier reports, the general adjust- 
ment factor was labeled “Neuroticism” with high scores indicative of poor adjust- 
ment. In more recent writings, including the present chapter, we have reversed this 
factor and called it "Emotional Stability" with high scores indicative of good ad- 
justment. 

The subjects for this scaling study were 244 male and 251 female introductory 
psychology students at Wake Forest University, primarily freshmen and sophomo- 
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res, who participated in order to meet a course requirement. Separate groups of 
students rated each of the five factors: 47 men and 49 women rated the Extraversion 
factor, 5] men and 50 women rated the Agreeableness factor, 48 men and 48 women 
rated the Conscientiousness factor, 48 men and 52 women rated the Emotional 
Stability factor, and 50 men and 51 women rated the Openness to Experience factor. 

Inasmuch as the five factors appear to be meaningful folk psychology concepts 
and not simply esoteric abstractions of personality psychologists (see McCrae, 
Costa, & Piedmont, 1993), one would reasonably assume that laypersons should be 
able to understand and make judgments about the five factors. Thus, it was conside- 
red reasonable to employ undergraduate students for the present study. 

Subjects were provided a booklet containing both an extensive set of instructions 
and all 300 ACL items. They were initially given a brief description of all five fac- 
tors (referred to as "characteristics") and told “personality psychologists believe that 
a description of an individual containing information regarding all five [factors or 
‘characteristics’] is a reasonably complete one." Subjects were then given a more 
complete set of instructions regarding the particular characteristic they were asked to 
rate. Costa and McCrae (1992) describe the factors in terms of their facets and these 
facet descriptions were used in the present study to illustrate the factors to the sub- 
jects. Illustrative examples were provided with three ACL items obtained from 
John's (1989) list that “тау be representative [indicative] of this characteristic" and 
three items that “тау suggest the opposite [counterindicative] of this characteristic." 
Finally, the subjects were instructed to rate all 300 ACL items in terms of the extent 
to which they seemed indicative or counterindicative of their single assigned factor 
on a 5-point scale from -2 (highly counterindicative) to 2 (highly indicative) with a 
rating of 0 to indicate in-between or not related. Specific instructions were given as 
follows: 

Characteristic I (Extraversion): "You will be asked to consider individual adjectives 
and to give your impression of the extent that each adjective is representative of 
Characteristic I or the opposite of Characteristic I. Characteristic I consists of several 
different facets. These include gregariousness, assertiveness, activity, excitement- 
seeking, positive emotions, and warmth.... You are asked to think about each adjec- 
tive in terms of the degree to which it is representative of one or more of the facets 
of Characteristic L...For each adjective, circle the number which you feel best re- 
flects the degree of Characteristic I...” Illustrative items were outgoing, active, and 
warm; and reserved, retiring, and withdrawn. 

Characteristic П (Agreeableness): "Characteristic H consists of several different 
facets. These include trust, straightforwardness, altruism, compliance, modesty, and 
tender-mindedness. An adjective which suggests one or more of these facets is 
considered indicative of the characteristic...however, an adjective which suggests the 
opposite of one or more of these facets is considered counterindicative of the cha- 
racteristic....” Illustrative items were trusting, modest, and sympathetic; and fault- 
finding, quarrelsome, and stingy. 

Characteristic 11 (Conscientiousness): "Characteristic III consists of several diffe- 
rent facets. These include competence, order, dutifulness, achievement-striving, self- 
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discipline, and deliberation...." Illustrative items were reliable, conscientious, and 
deliberate; and careless, disorderly, and frivolous. 

Characteristic IV (Emotional Stability): "Characteristic IV consists of several diffe- 
rent facets. These include anxiety, angry hostility, depression, self-consciousness, 
impulsiveness, and vulnerability..." Illustrative items were contented, unemotional, 
and stable; and anxious, touchy, and impulsive. 

Characteristic V (Openness to Experience): "Characteristic V reflects openness to 
new or unfamiliar experiences. This openness may be reflected in an appreciation of 
knowledge, various art forms, and nontraditional values as opposed to an appreciati- 
on of tradition and the status quo. This characteristic may be revealed in several 
different facets of an individual's behavior, including values, ideas, actions, feelings, 
fantasy, and appreciation of aesthetics..." Illustrative items were imaginative, artis- 
tic, and original; and shallow, narrow-interests, and commonplace. 

To control for possible order/fatigue effects, the order of the items was counter- 
balanced by dividing the ACL into thirds such that set A consisted of items 1-100, 
set B consisted of items 101-200, and set C consisted of items 201-300. Booklets 
were then distributed with the ACL items presented in the following orders: ABC, 
ACB, BAC, BCA, CAB, CBA. Each of the six orderings was distributed equally 
among the subjects. 

The study was carried out over two semesters with approximately one-half of the 
ratings obtained during the fall semester and one-half completed during the spring 
semester. Ratings for all five factors were obtained in both semesters to prevent the 
confounding of factor ratings with semester. The same female examiner presented 
the procedure to groups of approximately 30 individuals with most subjects finis- 
hing in 20 to 30 minutes. 

The original rating scale presented to subjects ranged from -2 to +2. This scale 
was converted to a 1 to 5 scale where 1 is highly counterindicative, 3 reflects an 
intermediate (or unrelated) position, and 5 is highly indicative or highly characteris- 
tic of the factor. Following this transformation, means were computed separately by 
gender across all 300 items for each factor with the results shown in Table 1. Note 
that all means are close to the mid-point of the scale (3.00) suggesting relatively 
equal numbers of items considered indicative and counterindicative of each factor. 
Standard deviations were sizable, indicating diversity among the adjectives in the 
extent to which they were thought indicative or counterindicative of the five factors. 


Table 1. Factor means, standard deviations, and correlations between gender groups for 300 ACL 
items 


Female 
бао к E subjects 50 Г 
Extraversion _ 3.07 0.86 3.02 0.99 .98* 
Agreeableness 2.96 0.89 2.95 0.99 .98* 
Conscientiousness 3.15 0.84 3.16 0.82 .98* 
Stability 2.92 0.68 3.00 0.80 .97* 
Openness to Experience 3.05 0.72 3.05 0.80 .97* 


*p«.001. 
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Data were initially analyzed by gender to determine the degree of agreement in 
the ratings by women and men subjects. Pearson product-moment correlation coeffi- 
cients were computed between men's and women's ratings across the 300 ACL 
items for each factor. As can be seen in Table 1, the results revealed high degree of 
agreement between men and women raters. Such high correlations suggest high 
reliability in the ratings, given that any true gender effects would serve to reduce the 
correlations. Thus, these correlation coefficients can be viewed as an indication of 
the lower limit of the reliability of the ratings. Given such high agreement between 
men and women, it was deemed appropriate to pool the male and female ratings in 
further analyses. Factor scores for each item were calculated by summing all of the 
subjects' ratings of that item (approximately 100 subjects for each of the five fac- 
tors), after which the average was calculated for that item. The mean five-factor 
ratings for each of the 300 ACL items are listed in the Appendix at the end of this 
chapter. 

To examine the relations among the five factors, product-moment correlations 
between the 300 ratings for each pair of factors were computed, the results of which 
are shown as the left-hand values in Table 2. Given that the Five-Factor Model has 
historically been derived from orthogonal factor rotations, the five factors were 
expected to be relatively independent. However, as seen in Table 2, inter-factor 
correlations were rather high. Indeed, the mean common variance between pairs of 
scales was 48 per cent. Since previous research had shown each of the factors to 
have a substantial favorability component (see below), it was hypothesized that the 
common variance among the factors might be attributable to this shared favorability. 
To examine this hypothesis, the authors employed data from a previous study (Wil- 
liams & Best, 1977) in which the 300 ACL items had been rated for favorability by 
university student judges. These mean item favorability ratings were correlated with 
mean item factor ratings for each of the five factors resulting in high correlations 
between each factor and favorability: .84 for Extraversion; .94 for Agreeableness; 
.80 for Conscientiousness; .80 for Emotional Stability; .72 for Openness. In this 
analysis, favorability accounted for anywhere between 52 per cent and 88 per cent of 
the variability in factor ratings. (To ensure that the high agreement between men and 
women raters could not be attributed solely to favorability, partial correlations bet- 
ween males' and females' ratings for each factor were examined, controlling for the 
favorability variable. The resulting correlations were: .93 for Extraversion; .88 for 
Agreeableness; .95 for Conscientiousness; .92 for Neuroticism; .95 for Openness. 


Table 2. Interfactor correlation matrix. 


Factor Agr Con | Sta Opn 

Ext .79(03) ‚70 (.11) .64 (-.08) .89 (.76) 
Agr .71 (-.19) 27980519) .64 (-.13) 
Con „63 (-.01) .56 (-.03) 
Ems .51 (-.16) 


———— S 
Note: Extraversion (Ext), Agreeableness (Agr), Conscientiousness (Con), Emotional Stability (Ems), 
and Openness to Experience (Opn). In parentheses are correlations after favorability was partialled 
out. 
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Thus, it appeared that the agreement between the sexes was indeed independent of 
favorability.) 

To remove the influence of favorability from the principal analyses, partial cor- 
relation coefficients were computed for each pair of the five factors while control- 
ling for favorability. The partial correlations are presented in parentheses in Table 2. 
Examination of this table reveals that removal of the variance attributable to favora- 
bility resulted in a dramatic reduction of the inter-factor correlations such that the 
factors appear to be generally independent of one another, with the exception of the 
relation between Openness to Experience and Extraversion. These two factors re- 
mained relatively highly correlated with a common variance of 58 per cent. Accor- 
ding to university student judges, therefore, there appears to be an empirical, positi- 
ve relationship between these two factors. 

One possible explanation for the aforementioned relationship between the factors 
Openness to Experience and Extraversion is the relative poverty of certain types of 
trait adjectives in the ACL item pool that might be considered indicative of an indi- 
vidual who is indeed open to experience. In John's (1989) previous study, graduate 
student raters placed a large number of adjectives associated with stimulus-seeking 
individuals in the Openness to Experience dimension (e.g., imaginative, inventive, 
interests wide), adjectives that may also be descriptive of extraverted individuals. 
Indeed, the student judges in this study rated those three items rather high on the 
extraverted dimension with scores of 4.14, 4.00, and 4.47, respectively. Therefore, 
the ACL item pool may not include a sufficient number of items that distinctively 
represent the Openness to Experience factor. Since "misery loves company," note 
that, among five factor researchers, this same factor has often been found to be the 
most difficult to define and conceptualize, as reflected in the variety of different 
names that have been offered (Digman, 1990; see also De Raad & Van Heck's, 1994 
special issue of the European Journal of Personality). 


Convergence with findings from other ACL studies 


Earlier work by John (1989) linking the ACL and Big Five resulted in groupings of 
ACL items rated by graduate student judges as being either indicative (I) or coun- 
terindicative (CI) of each of the five factors. For each factor, data from the present 
study were used to compute the mean rating of John's groups of I and CI adjectives. 
Mean ACL-FF scores for John's I and CI items, respectively, were: Extraversion, 
4.23 and 1.53; Agreeableness, 4.47 and 1.58; Conscientiousness, 4.47 апа 1.58;! 
Emotional Stability, 4.27 and 1.66; and Openness to Experience, 3.86 and 1.97. 
These values show high differentiation in the expected direction for each set of 
John's items, thus providing substantial evidence of convergent validity between 
John's system and the ACL-FF system. As noted earlier, the undergraduate student 
raters in the present study were given only brief descriptions of each factor. Howe- 


! The identical reported values for Agreeableness and Conscientiousness are correct. 
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ver, their ratings were highly congruent with those of trained graduate students, thus 
supporting the idea that the five factors are easily understood “folk psychology" 
concepts. 

In another earlier study, McCrae and Costa (1992) identified ACL items found to 
be significantly correlated with one or more facets of each of the five factors, positi- 
vely or negatively, as measured by the NEO-PI-R. Within each factor, we computed 
the mean ACL-FF score for the sets of ACL items collapsed across facets with the 
following results for positively and negatively correlated items, respectively: Extra- 
version, 4.31 and 1.65; Agreeableness, 4.49 and 2.48; Conscientiousness, 4.55 and 
2.11; Emotional Stability, 3.82 and 1.84; and Openness to Experience, 3.93 and 
2.24. These findings indicate substantial convergent validity between the new ACL- 
FF system and the widely used NEO-PI-R. 


The favorability of the five factors 


It was shown above that, for each of the five factors, the scoring weights were found 
to have substantial correlations with the independently rated favorability of the 300 
ACL items. Thus, an ACL description that is relatively high on the five factors will 
be a generally favorable one and, conversely, a generally favorable description will 
tend to be relatively high on the five factors. 

Further evidence of the evaluative nature of the factors is found in two studies re- 
ported by Goodman and Williams (1996) and summarized by Williams ег al., 
(1998). These studies employed Costa and McCrae's (1992) NEO-Five Factor In- 
ventory (NEO-FFI) for the assessment of the five factors, with the Neuroticism 
factor reversed as Emotional Stability. In the first study, it was demonstrated that, 
for each factor, items phrased in an "indicative" manner (e.g., extraverted, agreeable, 
etc.) were rated more favorably than items phrased in a "counter-indicative" manner 
(e.g., introverted, disagreeable, etc.). In а second study, one group of participants 
was instructed to "fake good" on their NEO-FFI self-descriptions while a second 
group was instructed to “fake bad." The result was that, for each factor, the mean 
"fake good" scores were much higher than the "fake bad" scores. Both studies were 
considered to support the idea that, for each factor, higher scores are more favorable 
than lower scores. We suspect that similar results would be found with most other 
Big Five assessment procedures, such as those described elsewhere in this book. 

How should one view the linkage between favorability and the dimensions of the 
Big Five? Should it be viewed as a "problem" for which one attempts to make cor- 
rections (á la social desirability)? Or are the five factors intrinsically evaluative and 
should be accepted as such? We favor the latter view, based on the following consi- 
derations. 

Osgood and his associates (Osgood ег al., 1975) explored the dimensions of af- 
fective (connotative) meaning in a large group of the world's languages. They stu- 
died English-speaking Americans and 22 other language/culture groups and found 
that, in each sample, the primary dimension was Evaluation, or favorability. Deno- 
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tative meanings aside, the connotative meanings of words reflected, primarily, their 
relative "goodness/badness" in their respective languages. Since the Five Factor 
Model was based on a lexical approach, identifying personality descriptive terms in 
the English language, it should not be surprising that the five factors carry evaluative 
connotations. If, as many believe, the five factors reflect the basic concerns, which 
people have when behavior is being assessed, we should not be surprised that an 
important element of this assessment is separating the “good guys" from the “bad 
guys." Persons who are extraverted, agreeable, conscientious, emotionally stable, 
and open-minded are viewed more favorably than persons who are introverted, 
disagreeable, irresponsible, neurotic, and close-minded. This appears to be the view 
that emerges from the language itself. 

There is, of course, the possibility of “too much of a good thing:” Excessive ex- 
traversion might border on the manic; excessive conscientiousness on the compulsi- 
ve, etc. With such possible exceptions, we conclude that, through the greater part of 
their score ranges, all five factors have an intrinsic positive association with favora- 
bility. While researchers sometimes may choose to study the five factors with fa- 
vorability controlled (e.g., see Table 2 above), they must bear in mind that they are 
examining artificially contrived concepts rather than the naturally occurring factors 
with their intrinsic favorability components. 


Two illustrative research applications 


Gender Stereotypes 


Here we describe two recent studies in which the ACL-FF scoring system has been 
used. The first study involved the re-analysis of the gender stereotype data from the 
Williams and Best (19902) project in which the data originally had been analyzed in 
terms of the three earlier scoring systems described above. In each of 27 countries 
from the Americas, Europe, Africa, Asia, and Oceania, university students had 
judged each ACL item (or its translated equivalent) as to whether, in their respective 
cultures, the adjective was more frequently associated with men or with women, or 
not differentially associated by gender. 

In each sample, a stereotype index score — called the M% score — was compu- 
ted for each of the 300 items by employing the responses of all subjects and dividing 
the frequency of association with men by the sum of the frequencies associated with 
men and with women (the frequency of equal association responses was not used). 
Computed in this way, high М% scores indicated items highly associated with men 
and low М% scores indicated items highly associated with women. 

In the first report from this study (Williams ег al., 1999), pancultural gender ste- 
reotypes were examined by computing the mean M% score for each item across all 
groups of student raters. The 79 items with mean M% scores of 67 and above — 
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items associated with men at least twice as often as with women — constituted the 
pancultural male stereotype; the 56 items with mean М% scores of 33 and below — 
items associated with women at least twice as often as with men — constituted the 
pancultural female stereotype. The ACL-FF scoring system was then applied to 
these two item sets to obtain the following mean Five Factor scores for the male and 
female pancultural stereotypes, respectively: Extraversion, 3.23 and 2.95 (p « .10); 
Agreeableness, 2.79 and 3.15 (p « .05); Conscientiousness, 3.43 and 2.89 (p « .001); 
Emotional Stability 3.11 and 2.79 (p « .01); and Openness to Experience 3.27 and 
2.95 (p « .05). Thus, the pancultural female stereotype was higher on Agreeableness 
and the male stereotype was higher on the other four factors. 

In a second report from this study (Williams et al., 2001), the data from each 
country were analyzed separately using the local М% scores to identify the items 
composing the focused male stereotype (M% of 67 and above) and the items com- 
posing the focused female stereotype (M96 of 33 and below). The two item sets in 
each sample were scored using the ACL-FF system to yield mean Big Five scores 
for each of the two gender stereotypes in that country. Relative to the female stere- 
otype, the male stereotype was higher in Conscientiousness (all 27 countries), 
Openness to Experience (26 of 27 countries), Extraversion (24 of 27), and Emotio- 
nal Stability (24 of 27). On the other hand, the female stereotype was high in 
Agreeableness in 22 of the 27 countries. As would be expected, the grand means of 
the individual country means for each of the two stereotypes revealed the same 
pancultural patterns found in the earlier analyses, with the female stereotypes higher 
on Agreeableness and the male stereotypes higher on the other four factors. 

An additional analysis involved computing an index of the degree to which the 
two stereotypes in each country were differentiated in terms of the Big Five factors. 
This differentiation index was found to be largest in Nigeria, Japan, and South Afri- 
ca, and smallest in Venezuela and France. Further analyses revealed that the diffe- 
rentiation scores were correlated with a number of cultural comparison variables; for 
example, the stereotypes tended to be more differentiated in countries where the 
prevailing sex-role ideology was more traditional (i.e., male dominant), in countries 
where fewer women entered higher education, and in countries where Schwartz 
(1994) found strong Hierarchy and Conservatism values. It was also found that 
stereotype differentiation was relatively low in countries where the female stereoty- 
pes were more favorable than the male stereotypes, and relatively high in countries 
where the male stereotype was more favorable. 

This re-analysis of the stereotypes in terms of the five factors should prove useful 
to scholars interested in relating the study of gender stereotypes to the growing 
literature on applications of the Five Factor model in other areas of personality and 
social psychology. 


The importance of psychological traits 


A second illustration of a research application of the ACL-FF scoring system is 
found in a study of the relative importance of various psychological characteristics 
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in different cultures (Williams et al., 1998). University students in 20 countries rated 
the importance of each of the 300 ACL adjectives — or their translated equivalents 
— опа 1 to 5 scale ranging from "little or no importance" to “critical or outstanding 
importance." Importance was defined as the degree to which an adjective describes a 
more basic or central personality characteristic as opposed to a more superficial or 
peripheral characteristic; more important adjectives are very informative (or dia- 
gnostic) as to "what a person is really like;" less important adjectives are less infor- 
mative. It was found that the psychological importance ratings were highly reliable 
in different cultures (Williams er al., 1998). University students in 20 countries rated 
the importance of each of the 300 ACL adjectives — or their translated equivalents 
— опа l to 5 scale ranging from "little or no importance” to “critical or outstanding 
importance." Importance was defined as the degree to which an adjective describes a 
more basic or central personality characteristic as opposed to a more superficial or 
peripheral characteristic; more important adjectives are very informative (or dia- 
gnostic) as to “what a person is really like;" less important adjectives are less infor- 
mative. It was found that the psvchological importance ratings were highly reliable 
in each country. Inter-country correlations were positive but only moderate in mag- 
nitude suggesting substantial cultural variation in the importance assigned to various 
traits. 

The ACL-FF system was employed to determine the characteristics associated 
with psychological importance in each of the 20 countries. In each country, the mean 
ratings of psychological importance were correlated with each of the five factor 
scales across the 300 ACL items. These analyses revealed substantial between- 
country variations in the relative importance of the five factors. 

In some countries, psychological importance was found to be equally associated 
with all of the five factors. In other countries, certain factors were more important 
than others. The 10 countries with the most highly differentiated patterns of associa- 
tion between five-factor scores and psychological importance scores are shown in 
Table 3. For example, in Hong Kong, Agreeableness (A) was much more important 


Table 3. Relative strength of the relationship of psychological importance scores to each of the five 
factor scores іп 10 countries? 


м 


Country Pattern 

Australia EA»»SCO 

Hong Kong A>>> CES >>> О 
India АСЕ ey 50 
Japan C>>> ASEO 
Korea CA»»ES»»0 
Nepal А > Е >> 5> 0 
Nigeria AUG Ss "E> 520 
Pakistan A»C»ES»»»0O 
Singapore $5»»»EC» AO 
Venezuela E59 Oa Chos 5 


(à Difference in common variance between adjoining factors: > + 5-9%; >> = 10-14%; >>> = 15% and 


up. + * и she 
(6) E = Extraversion; А = Agreeableness; С = Conscientiousness; 5 = Emotional Stability; O = Open- 
ness to Experience. 
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than Conscientiousness (C), Extraversion (E), and Emotional Stability (S), which, in 
turn, were much more important than Openness to Experience (O). 

In sum, this application of the ACL-FF system enabled us to better understand the 
cross-cultural variations in the importance attached to various psychological cha- 
racteristics. 


Other potential applications 


The ACL adjectives constitute a general set of person descriptors, which can be used 
in many different ways to describe real or imaginary individuals or groups. The wide 
variety of research questions that can be addressed via the ACL item pool, pre- 
viously summarized by Williams and Best (1983), includes descriptions of individu- 
al persons (e.g., politicians, spouses, children, etc.), descriptions of groups of per- 
sons (e.g., traits that collectively characterize groups of persons such as successful 
employees or successful students, individuals with clinical diagnosis A versus clini- 
cal diagnosis B, etc.), social stereotypes (i.e., subjects might be asked to describe 
their beliefs about the psychological characteristics of individuals within broad 
social or ethnic groups such as men versus women), historical figures (e.g., impres- 
sions of Stalin, Roosevelt, Churchill, Hitler), or even personified concepts. In the 
latter example, for instance, Gough and Heilbrun (1980, p. 40) report on studies 
comparing Fiat and Volkswagen automobiles and comparing the cities of Rome and 
Paris. 

Bassett and Williams (2000) recently employed the ACL scoring system in a stu- 
dy of personified concepts. Here, university students used the ACL item pool to 
describe the characteristics associated with God, Satan, and self in order to study the 
inter-relationships among these three concepts. With the availability of the ACL-FF 
scoring system, the results of studies of the aforementioned types can now be exa- 
mined in terms of the Five Factor model of personality. 


Information on computer scoring and translations 


Inquiries concerning computer scoring for the ACL-Five Factor system may be sent 
to: Jonathan F. Bassett, Dept. of Psychology, Georgia State University, Atlanta, GA 
30303-3083 (25075  panther.gsu.edu); or to John E. Williams, 4750 Bell Circle, 
S.E., Conyers, GA 30094 (jnwms@mediaone.net), or to Deborah Е. Hill, Wake 
Forest University School of Medicine, Medical Center Blvd., Winston-Salem, NC 
27157 (dfhillQ wfubmc.edu). 

Inquiries concerning translations of the ACL items to other languages may be 
sent to Deborah L. Best, Department of Psychology, Wake Forest University, Box 
7778, Winston-Salem, NC 27109 (best @wfu.edu). 
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Appendix 


Five Factor scores for the 300 items of the Adjective Check List. Ext = Extraversion; Agr = 
Agreeableness; Con = Conscientiousness; Ems = Emotional Stability; Opn = Openness 


—————————— 


Five Factor scores 


1. absent-minded 2.32 2127 197 2.56 212 
2. active 4.77 3.65 4.19 3.12 4.29 
3. adaptable 4.19 4.14 4.08 4.17 4.43 
4. adventurous 4.76 3.49 3.40 3.40 4.78 
5. affected 3.04 Soy, 3.14 2.18 3.06 
6. affectionate 4.24 4.48 3.19 3.74 3.36 
7. aggressive 3.91 2.50 3.90 27] 3.86 
8. alert 4.10 3.61 4.32 3.07 3.93 
9. aloof 2.19 2.21 2.40 2.65 2.50 
10. ambitious 4.35 3.39 4.79 3:35 4.11 
11. anxious 2.98 2.62 3.38 1.37 3.14 
12. apathetic 1.93 2.19 21 3.04 2515 
13. appreciative 3.85 4.38 3.56 3.62 3.76 
14. argumentative 215 1.80 3:22 1.84 2.87 
15. arrogant 2.47 1.50 2.76 2.68 2.44 
16. artistic 3.43 3.19 3.26 3.14 4.26 
17. assertive 4.53 3.26 4.44 3.21 3.78 
18. attractive 3.57 3.21 3.07 Bal 3.23 
19. autocratic 3.04 2.69 3.49 2.86 2.97 
20. awkward 2.00 2.45 2.36 2:25 2.50 
21. bitter 1.42 1.47 2.33 1.61 2.09 
22. blustery 2.60 2.38 2.58 2.45 2.76 
23. boastful 2.73 1.59 2.78 3.04 2.79 
24. bossy 2.87 1.65 3.09 2.58 2.51 
25. calm 3.02 4.00 3.65 4.27 3.10 
26. capable 3.03 4.01 4.64 3.39 3.89 
27. careless 2.39 1.88 1.40 2.47 271 
28. cautious 2.57 3.43 3.99 3.24 2.33 
29. changeable BS 3.49 3.10 3.07 4.19 
30. charming 4.11 3.98 3:21 3.68 3.39 
31. cheerful 4.67 4.21 9132 4.33 3.64 
32. civilized 3.76 3.88 3.96 3.42 3.18 
33. clear-thinking 3.72 4.11 4.68 4.13 3.66 
34. clever 3.93 3.60 4.28 3.28 3.93 
35. coarse 2.09 2.03 2.60 224. 2.56 
36. cold 1535 1.40 255 222 27 
37. commonplace 2.25 2.83 2-73 3.06 1.82 
38. complaining 1.71 171 235 1.82 2.04 
39. complicated 2.59 2.59 9115 2.14 3.06 
40. сопсенед 2.36 1.41 2.61 2.89 2.40 
41. confident 4.47 3.82 4.44 4.17 4.26 
42. confused 22] 2.39 1.94 1.99 2з] 


. conscientious 
. conservative 
. considerate 
. contented 

. conventional 
. cool 

. Cooperative 
. courageous 
. cowardly 

. cruel 

. curious 

. cynical 

. daring 

. deceitful 

. defensive 

. deliberate 

. demanding 
. dependable 
. dependent 
. despondent 
. determined 
. dignified 

. discreet 

. disorderly 
. dissatisfied 
. distractible 
. distrustful 

. dominant 

. dreamy 

. dull 

. easy-going 
. effeminate 
. efficient 

. egotistical 
. emotional 

. energetic 

. enterprising 
. enthusiastic 
. evasive 

. excitable 

. fair-minded 
. fault-finding 
. fearful 

. feminine 

. fickle 

. flirtatious 

. foolish 

. forceful 

. foresighted 
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3.61 
2.50 
4.04 
3,55 
22 
3.41 
3.87 
4.18 
1.67 
1.52 
4.17 
1.86 
4.24 
1.87 
2.26 
3:24 
2:99 
3.89 
2.48 
2.17 
4.35 
3.65 
2.67 
2.41 
2.21 
2.60 
1.91 
3.84 
3.36 
1.57 
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2215 
3205 
2.66 
3.67 
4.7] 
4.24 
4.77 
222 
441 
3.56 
29 
1.93 
2.81 
2.48 
3-73 
2:97 
3.16 
3.30 


4.00 
3:31 
4.72 
3.81 
3319 
3.14 
4.43 
3.59 
2:93 
1.20 
3.61 
1.90 
313 
1:32 
220 
3.03 
2.18 
4.59 
2.53 
2539 
AT 
3:57 
3.42 
2.30 
2.16 
2.34 
1.34 
2.26 
312 
2.44 
4.09 
3.14 
375 
1.65 
3.70 
3.87 
3.60 
4.13 
2AB 
3:53 
4.02 
1.62 
2.36 
3:25 
228 
2.97 
2:21 
MIS 
3.43 


4.37 
3.51 
3.56 
3.31 
3.41 
3.26 
4.14 
3.88 
2.13 
2.37 
3.88 
2.68 
3.41 
2.18 
2.94 
4.09 
3.78 
470 
2.31 
2.50 
4.70 
3.82 
3.15 
1.23 
2.45 
1.95 
2.03 
3.73 
2.65 
2.59 
3.06 
2.90 
4.73 
2.93 
2.88 
4.09 
4.39 
4.14 
2.55 
3.51 
3.68 
2.97 
2.48 
2.81 
2.48 
2.78 
1.97 
3.51 
4.14 


2:99 
3.20 
3:72 
4.27 
3.51 
3-33 
3:79 
3.63 
2.40 
2.40 
3.02 
1.82 
3.20 
2.47 
1.63 
2.90 
2.29 
3.81 
2.14 
2.28 
3335 
3.67 
3.58 
22] 
[ЛЇЇ 
2.13 
2.18 
3.08 
2.78 
3.00 
4.16 
3.11 
3.63 
2.84 
1.66 
3.28 
3.55 
3.80 
293 
2.54 
3:75 
1-91 
1.91 
3.07 
2120 
328 
2.61 
205 
3.58 


3.21 
1.68 
3.41 
22 
1.90 
3:22 
8:75 
4.43 
1.49 
2.34 
4.72 
210 
4.61 
2.56 
2:27 
205 
3.05 
3.15 
2:51 
2:55 
4.02 
325 
2.63 
2:92 
321 
3.05 
2.30 
3.04 
4.00 
1.74 
3.86 
2.90 
3:20 
2.70 
3.45 
4.40 
4.38 
4.42 
2.44 
4.11 
3.56 
2.09 
1.78 
2.92 
2.58 
3.44 
2.95 
3.15 
3.28 
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forgetful 


. forgiving 


formal 


. frank 


friendly 
frivolous 


. fussy 
. generous 


gentle 
gloomy 
good-looking 
good-natured 
greedy 
handsome 
hard-headed 
hard-hearted 
hasty 
headstrong 
healthy 
helpful 
high-strung 
honest 
hostile 
humorous 
hurried 
idealistic 
imaginative 
immature 
impatient 
impulsive 
independent 
indifferent 
individualistic 
industrious 
infantile 
informal 
ingenious 
inhibited 
initiative 
insightful 
intelligent 


interests narrow 


interests wide 
intolerant 
inventive 
irresponsible 
irritable 


. jolly 


kind 


2:17 
4.50 
297 
3.88 
4.58 
2.45 
1.96 
4.38 
4.46 
1:95 
ST 
4.34 
1.59 
3.08 
1:97 
1.50 
2.28 
2.64 
3.44 
4.59 
2.38 
4.79 
1.43 
3.68 
2.47 
3.41 
3.61 
2.08 
1.58 
2.60 
3.23 
1.86 
3.08 
3.50 
2.27 
3.05 
3.36 
2:37 
3.79 
3.94 
3.74 
2.09 
3/96 
1.44 
3:53 
1:79 
NS 
4.06 
4.65 


1.47 
3.42 
X51 
3.72 
3.62 
2:15 
2.86 
3.45 
3.19 
232 
3.03 
3.67 
2.13 
3.03 
371 
2.89 
2.18 
3.90 
3.60 
4.03 
ЗЛ 
3.96 
2.46 
3.21 
2.65 
3.65 
3.69 
1.89 
2.42 
232 
4.30 
2.34 
3.90 
4.68 
2.04 
2.63 
3.87 
2.64 
4.59 
4.29 
4.44 
2:32 
3:69 
2:65 
4.02 
1.30 
2.43 
3.27 
3:35 


2.61 
3.83 
3592 
2.83 
3.81 
2:89 
1.99 
3.68 
3.83 
1.65 
3.29 
3192 
2.61 
320 
2.30 
207 
2.21 
2.83 
2.67 
2.78 
1.90 
3.43 
1.54 
3:75 
2 IB 
3.23 
3.11 
226 
1.74 
1.62 
3.66 
3.13 
3:19 
352 
22] 
3.01 
2.43 
2.45 
3.62 
3:51 
3.10 
2.41 
3:59 
1.94 
3199 
2.30 
1.46 
4.26 
то 
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141. lazy 1.76 227 1.38 262 2.01 
142. leisurely 3.00 3.17 2.16 3.40 3.05 
143. logical 3.34 3.60 4.38 3.70 315 
144. loud 3.64 2.54 2.94 2.60 3.19 
145. loyal | 3.87 4.64 3.79 3.46 2195 
146. mannerly 3.49 3.78 3.89 3.67 2.89 
147. masculine 3.07 2193 3.12 3.05 3.07 
148. mature 3.49 4.01 4.27 3.88 3.47 
149. meek 1.91 2.98 2.62 2:92 2:30 
150. methodical 2.88 3.18 4.37 327 2.46 
151. mild 2.42 3.36 2.97 3.70 242 
152. mischievous 3.35 225 - 2.34 2.54 3.63 
153. moderate 3774 / 3:35 324 3:32 2.84 
154. modest 2.76 4.21 327 -3S6 ZGN 
155. moody 2.21 2.06 2.61 1:35 2:73 
156. nagging 1.92 1.74 2.64 1.96 2.40 
157. natural 3.81 3.74 3:27 3.50 3:72 
158. nervous 21 2.44 2.70 1:72 222 
159. noisy 3.41 2.40 2:21 2:52 3.09 
160. obliging зз] 3.88 3.40 3:33 3.20 
161. obnoxious 2.03 1.68 232 2117 2.71 
162. opinionated 351 2.44 3.48 22 2.96 
163. opportunistic 4.04 3.09 4.22 3.09 4.12 
164. optimistic 4.53 4.22 4.14 4.32 4.19 
165. organized 3.61 a 4.87 3.64 3.14 
166. original 4.07 3.62 3.92 3.38 4.35 
167. outgoing 4.87 4.19 3.89 3.93 4.37 
168. outspoken 4.32 3.03 3:67 3.09 3.68 
169. painstaking 2-11 2.93 3.53 2.85 2.76 
170. patient 3.18 4.42 3.82 4.25 3.44 
171. peaceable 3.55 4.39 3.63 4.13 3.47 
172. peculiar 2.89 2.81 2.97 2:25 3.60 
173. persevering 3.76 3.65 4.49 3:37 3:01 
174. persistent 4.11 3.42 4.70 312 3.64 
175. pessimistic 1.60 1.67 2.03 1.58 1.80 
176. planful 3.44 3.57 4.59 3.62 2.85 
177. pleasant 4.17 4.33 3.45 4.10 255 
178. pleasure-seeking 4.73 3.56 3.05 3.24 4.30 
179. poised 3.65 3:52 3.85 3.70 322 
180. polished 3.54 3.38 3.89 3.89 3.07 
181. practical 3:25 3.72 4.40 321 2.74 
182. praising 3.85 4.22 3.36 3.95 3.38 
183. precise 325 3:37 4.46 3.49 2.88 
184. prejudiced 2.04 1.64 2.42 2.45 LU 
185. preoccupied 2.56 2.16 2.67 2.01 2.19 
186. progressive 3.78 3.34 3:73 3.36 4.22 
187. prudish 2.05 2.48 2.88 2.81 1.93 
188. quarrelsome 1.95 1.34 2.46 LS 2.37 


189. queer 2.15 22] 2.58 2.59 2.99 
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ашск 
quiet 
quitting 


. rational 


rattlebrained 
realistic 
reasonable 
rebellious 
reckless 
reflective 
relaxed 
reliable 
resentful 
reserved 
resourceful 
responsible 
restless 
retiring 
rigid 

robust 

rude 
sarcastic 
self-centered 
self-confident 
self-controlled 
self-denying 
self-pitying 
self-punishing 
self-seeking 
selfish 
sensitive 
sentimental 
serious 
severe 

sexy 
shallow 
sharp-witted 
shiftless 
show-off 
shrewd 

shy 

silent 

simple 
sincere 
slipshod 
slow 

sly 

smug 
snobbish 
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3.78 3.19 3.59 3.01 3.50 
1.69 3.06 2.90 3.36 2.47 
1.54 1.83 1.24 2.34 1.90 
3.44 3.71 4.34 4.11 3.02 
2.39 UH 1.65 2.09 2.66 
322 3.66 4.21 3.87 2.90 
3.70 4.05 4.26 4.08 3.34 
3.26 2.06 2.19 211 3.79 
3.05 1.91 1.68 2.11 3.60 
3313 3.70 3.74 3.34 3.32 
3.36 3.84 3.18 4.40 3.58 
3.78 4.52 4.77 3.96 3.16 
1.66 1.55 2.24 1.81 2.24 
1:67 2.96 3.00 3.18 1.80 
4.14 3.58 4.58 3.59 4.02 
3.83 4.30 4.89 507/1 3:32 
355 2.45 2.68 1.88 ЗЮ 
1.65 2.64 2.34 3.24 2.19 
1.87 1.90 3113 2.2] 1.61 
3.45 2.96 3.06 3.16 3.42 
1.82 1.25 2.38 2.0 2.60 
2.28 1.64 2.60 1.90 2.62 
2.36 ДАЈ 2.69 2.20 2.42 
4.42 3.78 4.32 4.03 4.22 
3.63 3.88 4.46 3.98 3.47 
252 2.93 204 2.74 2.16 
1.75 1.89 2.09 1.83 2.02 
1.83 2.18 2.79 2.05 2.24 
3.12 2.59 3.41 2-75 3.91 
2l 1.37 2637 2.26 25] 
3.87 4.63 3.38 2.87 3.50 
3.56 4.07 3.05 3812 3.10 
272 3.49 4.33 2.54 2.76 
2.03 2.19 3.03 ZI 2.47 
3.44 3.06 2.94 3.32 3.38 
218 1.75 2.42 2.65 1.85 
3.82 3.43 3.90 3.17 3.80 
2.50 2.34 2.63 2.75 2.61 
3.44 1.81 2.70 3418 3:38 
2.65 2.74 3255 2381 3.03 
1.40 2.69 2.51 2.53 1.84 
1.42 2.48 2.47 2.84 2515 
2.48 3.18 2.85 3.39 2.42 
3.84 4.60 3575 3.48 3.43 
2158 2.40 2.34 2.74 2.80 
2.01 272 203 3.03 253] 
2.97 2517 2.93 2.70 3:25 
9133 1.93 2.70 Dail 255 
6 1.42 2755 2.78 2.06 
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239. sociable 4.85 4.20 3.54 3.99 4.23 
240. soft-hearted 3.88 4.52 312 3.58 3.40 
241. sophisticated 3.41 3.36 3.70 3.49 3.47 
242. spendthrift 2.91 2:15 2.93 2.92 3.00 
243. spineless 1.90 1.94 2.09 2:53 1.84 
244. spontaneous 4.50 3.38 2.89 2.68 4.63 
245. spunky 4.49 3.45 3.15 3.40 4.27 
246. stable 3.47 3.94 4.28 4.61 2.86 
247. steady 3.40 2:095 4.30 4.47 2.74 
248. stern 241 2.56 3.48 2.82 2.41 
249. stingy 1.91 1.80 2.74 2.70 247 
250. stolid 2.46 2.70 3.03 3.15 2.62 
251. strong 3.84 3.67 3.95 3.62 9159 
252. stubborn 2.70 221 3.43 2:37 2.08 
253. submissive 1.90 2.81 2.18 2.86 2:39 
254. suggestible 3:39 385 3.21 2.87 3:5] 
255. sulky 1.84 2.07 292 1.81 2:88 
256. superstitious 203 2.58 2.63 2.40 2.55 
257. suspicious 2.34 2.07 2.87 1.93 237 
258. sympathetic 3.83 4.59 3.20 432 3.42 
259. tactful 3.71 4.07 4.09 3.58 3.38 
260. tactless 2,13 1:75 1.91 2.46 2.61 
261. talkative 4.58 8:97 3.32 3.12 3.70 
262. temperamental 2.84 2.34 2.65 1.69 2.83 
263. tense 2.11 2.30 3.11 1.69 2.24 
264. thankless 2.07 1.72 2.30 252 2.42 
265. thorough 3:50 3.62 4.74 3:33 3.34 
266. thoughtful 4.14 4.66 3.93 3.65 3.65 
267. thrifty 3.04 2.97 3.66 3.07 3.05 
268. timid 1.47 2.65 2.38 2.69 1.85 
269. tolerant 3.70 4.28 3.50 3.95 3.85 
270. touchy 2:39 283 2.65 Toon 2:87 
271. tough 3.28 2.90 3.74 3019 665 
272. trusting 3.97 4.52 3.73 3.81 3:77 
273. unaffected 2.63 2:35 2.98 3.96 3.00 
274. unambitious 1.61 2:32 1.31 2.80 1.75 
275. unassuming 2.61 2.99 2:35 3.50 2.85 
276. unconventional 3.16 2.79 2.60 2.82 3.97 
277. undependable 2.01 1.48 17 2832 2.69 
278. understanding 4.13 4.71 3.80 3.67 3.74 
279. unemotional 1.72 1.94 2291 4.20 2.42 
280. unexcitable 1.49 2.19 2.64 3.94 1.76 
281. unfriendly 1:27 1.38 2.45 357 2.18 
282. uninhibited 3.55 3.04 2.76 327 4.04 
283. unintelligent 2.13 2.34 1.54 312 2.45 
284. ипкта 1.61 1.30 2.41 257 2.38 
285. unrealistic 2.56 2.45 1.87 2:37 3.18 
286. unscrupulous 2.39 2.20 2.40 2.85 2.87 


287. unselfish 3.65 4.31 3.27 3.65 3.55 


288. 
289. 
290. 
291. 
292. 
293. 
294. 
295. 
296. 
297. 
298. 
299. 
300. 
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unstable 
vindictive 
versatile 
warm 
wary 
weak 
whiny 
wholesome 
wise 
withdrawn 
witty 
worrying 
zany 


231 
1.90 
4.13 
4.56 
2:25 
175 
I3 
3.60 
3.45 
1.36 
4.06 
2.03 
4.20 


2 15 
sl 
3.83 
4.51 
2.44 
2.47 
P73 
3.90 
3.89 
249 
3.58 
2.49 
3.18 


1.69 
2.48 
4.04 
3.38 
3.03 
1199 
2.13 
3:95 
4.40 
2.46 
3.61 
2.94 
2.87 


1.48 
2.08 
3/52 
3.83 
219 
2.43 
1.88 
358 
3.47 
2.07 
3.36 
1.61 
3.24 


2:99 
2.62 
4.54 
3.57 
225 
217 
2.01 
3.07 
37/3 
1.84 
3.61 
1.96 
4.00 
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The MMPI-2 Personality Psychopathology—Five 
(PSY-5) scales and the Five Factor Model 


John L. McNulty 
Allan R. Harkness 


Introduction 


Recent sources detailing personality assessment options (e.g., Butcher & Rouse, 
1996; Butcher & Williams, 2000; Friedman, Lewak, Nichols, & Webb, 2001; 
Greene, 2000: Millon & Davis, 2000; Widiger & Trull, 1997) have described the 
Personality Psychopathology - Five scales (PSY-5; Harkness, McNulty, & Ben- 
Porath, 1995) that can be scored from the item responses to the Minnesota Multipha- 
sic Personality Inventory-2 (MMPI-2; Butcher et al., 2001). Although mention of 
the MMPI-2 causes many psychologists to reflexively think of empirical scale con- 
struction, the MMPI-2 PS Y-5 scales were constructed in a process that is the polar 
opposite of contrasted-groups empirical construction. In the development of the 
PSY-5, psychological theory, hence trait constructs were developed first, followed 
by the construction of MMPI-2 scales designed to optimize quantified communicati- 
on (Harkness & Hogan, 1995; Harkness, in press) between the test-taker and test- 
interpreter. In this chapter, we describe the PSY-5 and compare them with the Five 
Factor Model (FFM). Next, we detail the development of the PSY-5 theoretical con- 
structs (Harkness & McNulty, 1994) from markers of normal personality (Tellegen, 
1982) and fundamental topics in the personality disorders (Harkness, 1992). We 
then describe the procedures used to build MMPI-2 PSY-5 scales optimized for 
quantified communication (Harkness, McNulty, & Ben-Porath, 1995). Psychometric 
properties and summary of recent validity studies are presented next, followed by 
general administration and scoring recommendations. Finally, we present guidelines 
for clinical interpretation of the MMPI-2 PSY-5 scales. This chapter complements a 
University of Minnesota Press Test Report on the PSY-5 (Harkness, McNulty, Ben- 


Porath, & Graham, in press). 
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The PSY-5 theoretical constructs: Similarities to and 
distinctions from the FFM 


Because many readers are familiar with the FFM, particularly as implemented by 
Costa and McCrae (1992) in the NEO-PI-R, the most pressing need is to begin this 
chapter with a description of the PSY-5 theoretical constructs, and a discussion of 
how the PSY-5 constructs are similar to and distinct from the FFM constructs. 


PSY-5 Aggressiveness 


Aggression can take on many forms. PSY-5 Aggressiveness focuses on aggression 
that is used to accomplish goals or intimidate others. PSY-5 Aggressiveness does 
not emphasize the aggression seen when one is cornered or reacting to the aggressi- 
on of others. Interpersonally, high PSY-5 Aggressiveness is linked with dominance 
and hate. 


PSY-5 Psychoticism 


Some patients with personality disorders show some degree of disconnection from 
reality. This is seen, for example, in schizotypal (but not schizoid), and paranoid 
personality disorders. Some patients with borderline personality disorder have mi- 
cro-psychotic episodes in which they appear to take leave of reality for a circumscri- 
bed period of time. The grandiose self-evaluations of some narcissistic personality 
disorder patients are clearly at odds with reality. Although "degree of connection to 
reality" has not been classically considered a part of personality, it clearly colors the 
effects of all other personality variables. 

The PSY-5 dimension of Psychoticism assesses this degree of disconnection from 
reality. Unshared beliefs, as well as unusual sensory and perceptual experiences are 
examples of disconnection. Alienated and unrealistic expectations of harm from 
others are also assessed. PSY-5 Psychoticism is phenotypic: it is not linked to any 
specific etiology. 


PSY-5 Disconstraint 


Tellegen's (1982) Constraint concept led to identifying PSY-5 Disconstraint (the 
current PSY-5 name is now reversed from the original name Constraint) in aggre- 
gated normal personality and personality psychopathology markers (Harkness & 
McNulty, 1994). This construct has been further described by Watson & Clark 
(1993). The high disconsiraint person is more open to physical risk taking, more 
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spontaneous and less controlled, and less rule bound than the more constrained per- 
son. 


PSY-5 Negative Emotionality/Neuroticism 


The personality disposition to experience negative (valence) affects and emotions 
was articulated by Tellegen (1982) and further described in Watson and Clark's 
(1984) review. To focus on problematic features of incoming information, to уоттуу, 
to be self-critical, to feel guilty, and to concoct worst-case scenarios are common 
features of elevated Negative Emotionality/Neuroticism. 


PSY-5 Introversion/Low Positive Emotionality 


Although linked with the corresponding social dimension of Introversion versus 
Extroversion, Tellegen (1982; 1985) and Watson and Clark (1997) argued persuasi- 
vely that the core of this individual differences dimension is affective. People differ 
in the readiness to experience the positive emotions. We retain both labels to empha- 
size the link between the two. The current label of PSY-5 Introversion/Low Positive 
Emotionality 15 reversed from the original label of Extraversion/Positive Emotiona- 
lity. This change of label was done for the assignment of MMPI-2 Uniform T scores 
(Tellegen & Ben-Porath, 1992) for which any skew must be aligned as positive. 


Comparing the PSY-5 and the Five Factor Models 


Harkness and McNulty (1994) suggested that these PSY-5 constructs are linked to, 
yet distinct from, other personality trait models such as the FFM. They noted that 
both PSY-5 and FFM shared the Negative Emotionality/Neuroticism and Introversi- 
on/Low Positive Emotionality (i.e., reversed Extraversion) constructs. The authors 
asserted that although PSY-5 Aggressiveness shares some features of FFM reflected 
Agreeableness, the PSY-5 construct emphasizes more extreme aggression, cruelty, 
and violence. Finally, Harkness and McNulty argued that PSY-5 Disconstraint is not 
comparable to FFM Conscientiousness, and that PSY-5 Psychoticism is not tapped 
by the FFM. 

Widiger and Trull (1997) subsequently compared the FEM and the PSY-5. They 
acknowledged the similarity of FFM Neuroticism and Extraversion to PSY-5 Nega- 


! Many of the studies summarized in this chapter were conducted using the original scoring direction 
and names for the Constraint en Extraversion/Positive Emotionality scales. For clarity and consis- 
tency of presentation, results of those studies are reported here using the current scoring direction 
and Disconstaint and Introversion/Low Positive Emotionality scale names, respectively. The directi- 
on of relations with extratest data involving these two scales has been reversed from that reported 
in the original source, where appropriate. 
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tive Emotionality/Neuroticism and Introversion/Low Positive Emotionality, respec- 
tively, and focused most of their discussion on PSY-5 Aggressiveness, Discons- 
traint, and Psychoticism. They concluded that high PSY-5 Aggressiveness may be 
similar to low FFM Agreeableness, but that opposite ends of the constructs may be 
dissimilar. Widiger and Trull suggested that the difference between FFM Conscien- 
tiousness and PSY-5 Disconstraint may result from a different organization of FFM 
facets. For example, Trull, Useda, Costa, and McCrae (1995) noted the relation bet- 
ween NEO PI-R Excitement Seeking (r = .41, sign of correlation is reversed to re- 
flect PSY-5 renaming), a facet of Extraversion, and PSY-5 Disconstraint. Further- 
more, they argued that the MMPI-2 item pool is limited in having a small number of 
items tapping FFM Conscientiousness (e.g., Costa, Zonderman, McCrae, & Willi- 
ams, 1985). Finally, Widiger and Trull acknowledged the clear difference between 
PSY-S Psychoticism and FFM Openness to Experience. In general, Widiger and 
Trull's comparison of FFM and PSY-5 reached conclusions similar to those in Har- 
kness and McNulty (1994). 


History of the development of the PSY-5 constructs 


The development of the PSY-5 began in 1989 with Allan Harkness's (1989 / 1990) 
University of Minnesota doctoral dissertation. From a pool of symptoms and cha- 
racteristics of both normal personality functioning and personality disorder, 60 ma- 
jor topics in human personality were identified. Thirty-nine of these were derived 
from symptoms of the DSM-III-R personality disorders (PD) and Cleckley's (1982) 
descriptors of psychopathy and 26 were derived from characteristics of normal per- 
sonality included in Tellegen's (1982) Multidimensional Personality Questionnaire 
(five of the PD markers duplicated normal personality markers and were fused into 
single items). Harkness (1992) published the development of the 39 PD "funda- 
mental topics," and 26 normal personality markers, documenting their similarity to 
inclusive marker sets generated by Clark (1990) and Livesley, Jackson, and Schroe- 
der (1989). These topics comprised an inclusive set of topics for "facet" level con- 
struction of personality disorder assessment. 

Harkness and McNulty (1994) used the 60 fundamental topics of PDs and normal 
personality to develop the PSY-5 constructs. They reported eigen-vector analyses of 
a psychological distance matrix generated from the judgments of 201 lay persons. 
The summed similarity matrix collected short and long psychological distances as 
well as metrifying opposite relationships among the 60 markers of normal and 
disordered personality (slightly over a million independent judgments of lay people). 
Following these analyses, the authors described the five highest-order marker aggre- 
gates, labeled them the PSY-5, and linked them to relevant literature. The PSY-5, as 
described above, provided five broad individual differences vectors for distinguis- 
hing personalities. 
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The PSY-5 scale development 


With the PSY-5 constructs in hand, Harkness, McNulty, and Ben-Porath (cf., 1995) 
set about building scales to measure them from items of the MMPI-2 (Butcher, Gra- 
ham, Dahlstrom, Tellegen, & Kaemmer, 1989) item pool. The MMPI-2 offered se- 
veral potential advantages. First, the MMPI-2 had been used in a wide variety of 
clinical, forensic, and correctional contexts as a measure of personality functioning 
and psychopathology. Therefore, extensive data were already available. Second, the 
567 items of the MMPI-2 cover a wide range of personality characteristics and 
symptoms of psychopathology. Third, the MMPI-2 had been standardized in 1989, 
providing a large normative sample representative of the population of the United 
States. Fourth, the MMPI-2 contains validity scales that allow assessment of the 
respondent's test-taking approach. Persons evidencing content nonresponsiveness, 
or content responsive distortions (Berry, 1995; Nichols, Greene, & Schmolck, 1989) 
could be identified. 

Harkness, McNulty and Ben-Porath (1994) developed Replicated Rational Selec- 
tion (RRS) for identifying potential PSY-5 items. In RRS, many persons were taught 
the psychological features contained in each of the PSY-5 constructs. These trained 
item selectors then examined the entire MMPI-2 item pool for candidate items for 
each construct. Items for which a majority of selectors replicated each other's judg- 
ments were then further examined. The candidate items were reviewed to ensure 
they could be clearly keyed, were not projective in nature, and were relevant to only 
one construct. Items that failed this review were eliminated. The resultant scales 
were psychometrically analyzed in four samples, and items that evidenced poor in- 
ternal consistency or were more strongly correlated with another scale were deleted. 

The final version of the PSY-5 scale's item composition and scoring direction are 
shown in Table 1, and item examples for each of the scales are shown in Table 2. 


Psychometric properties 


Harkness, McNulty, and Ben-Porath (1995) reported the psychometric properties of 
the PSY-5 scales using the MMPI-2 normative sample (Butcher ег al., 1989), a col- 
lege sample, and three clinical samples. Across the five samples, coefficient alphas 
ranged from .65 to .88, with higher alphas found in samples having greater base- 
rates of psychopathology. In the MMPI-2 normative sample (for men, № = 1,121; for 
women, М = 1,446) coefficient alphas ranged from .65 (Disconstraint) to .84 (Nega- 
tive Emotionality/Neuroticism) for men, and from .65 (Aggressiveness and Dis- 
constraint) to .84 (Negative Emotionality/Neuroticism) for women. Normative sam- 
ple intercorrelations evidenced a lower level than found among many sets of MMPI 
scales (Watson & Clark, 1984). The strongest intercorrelation is between Psychoti- 
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Table 1: Scoring Keys For The PSY-5 scales 
m НИ" "РОН 9 e от 0 4 


AGGR - Aggressiveness (18 items) 
True ` 27 50 85 134 239 323 324 346 350 358 414 423 452 521 548 
False 70 446 503 

PSYC - Psychoticism (25 items) 


True 24 42 48 72 96 99 138 144 198 241 259 315 319 336 355 361 374 448 466 490 508 549 
551 


False 184 427 

DISC - Disconstraint (29 items) 
True 35 84 88 103 105 123 209 222 250 284 344 362 385 412 417 418 431 477 
False 34 100 121 126 154 263 266 309 351 402 497 

NEGE - Negative Emotionality/Neuroticism (33 items) 


True 37 52 82 93 116 166 196 213 290 301 305 329 375 389 390 395 397 407 409 415 435 
442 444 451 513 542 556 
False 63 223 372 405 496 564 


INTR - Introversion/Low Positive emotionality (34 items) 


True 38 56 233 515 517 
False 9 49 61 75 78 86 95 109 131 174 188 189 207 226 231 244 267 318 330 340 342 343 
353 356 359 370 460 531 534 


Note: From ММР!-2 (Minnesota Multiphasic Personality Inventory - 2): Manual for Administration, 
Scoring and Interpretation (2nd ed.), by J. Butcher, J. Graham, Y. Ben-Porath, A. Tellegen, W. 
Dahlstrom, and B. Kaemmer, 2001. Copyright 2001 by the Regents of the University of Minnesota. 
Adapted and reprinted by permission. 


cism and Negative Emotionality/Neuroticism, at r = .52. The level of intercorrelati- 
on is directly comparable to the strongest intercorrelation reported between domain 
scores on the NEO PI-R, r = -.53 between Neuroticism and Conscientiousness 
(Costa & McCrae, 1992). The mean absolute correlation between the PS Y-5 scales 
was r = .25 (range = -.05 to .52) in the MMPI-2 normative sample, and is compara- 
ble to the domain level intercorrelation found in the NEO PI-R (mean absolute r — 
.21). Trull et al. (1995) reported six month test-retest stabilities ranging from .62 
(Aggressiveness) to .86 (Disconstraint) in a sample of 44 clinic outpatients. Har- 
kness, Spiro, Butcher, and Ben-Porath (1995) reported five year test-retest stabilities 
for the PSY-5 scales that ranged from .69 (Psychoticism) to .82 (Negative Emotio- 
nality/Neuroticism) in the Boston VA Normative Aging Study samples. 

PSY-5 scale raw score means and standard deviations from the MMPI-2 normati- 
ve sample, by gender, are reported in Harkness er al. (in press), along with tables for 
converting raw scale scores to uniform T scores. Measures of each scale's internal 
consistency (Cronbach's Coefficient Alpha), standard errors of measurement, scale 
intercorrelations and one week test-retest correlations calculated from the MMPI-2 
normative sample (by gender) are provided in Harkness et al. (in press). 
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Table 2: Example items from the MMPI-2 based PSY-5 scales (Scoring direction in parentheses) 


AGGR: Aggressiveness 


27 


70 
239 
324 
521 
548 


(T) 


(F) 
(T) 
(T) 
(T) 
(T) 


When people do me a wrong, | feel | should pay them back if | can, just for the princi- 
ple of the thing. 

| am easily downed in an argument. 

| am entirely self-confident. 

| can easily make other people afraid of me, and sometimes do for the fun of it. 

| like making decisions and assigning jobs to others. 

I’ve been so angry at times that l've hurt someone in a physical fight. 


PSYC: Psychoticism 


48 
96 
198 
241 
336 
508 


(T) 
(Т) 
(Т) 
(Т) 
(Т) 
(Т) 


Most anytime | would rather sit and daydream than do anything else. 
| see things or animals or people around me that others do not see. 

l often hear voices without knowing where they come from. 

It is safer to trust nobody. 

Someone has control over my mind. 

1 often feel | can read other people's minds. 


DISC: Disconstraint 


100 
105 
126 
266 
309 
497 


(F) 
(T) 
(F) 
(F) 
(F) 
(F) 


| have never done anything dangerous for the thrill of it. 

In school | was sometimes sent to the principal for bad behavior. 

| believe in law enforcement. 

| have never been in trouble with the law. 

| usually have to stop and think before | act even in small matters. 
It bothers me greatly to think of making changes in my life. 


NEGE: Negative Emotionality/Neuroticism 


223 
301 
372 
405 
442 


515 


(Р) 
(Т) 
(Р) 
(Р) 
(Т) 


(Т) 


(Р) 


| believe | am no more nervous than most others. 

| feel anxiety about something or someone almost all the time. 

| am not easily angered. 

| am usually calm and not easily upset. 

| must admit that | have at times been worried beyond reason over something that 
really did not matter. 

Sometimes | get so angry and upset 1 don't know what comes over me. 


: Introversion 


| am a very sociable person. 


(Е) lam an important person. 

(Е) t usually feel that life is worthwhile. 

(Е) {usually expect to succeed in things | do. 

(Е) 1 enjoy social gatherings just to be with people. 
(T) lam never happier than when | am by myself. 


Note: From MMPI-2 (Minnesota Multiphasic Personality Inventory - 2): Manual for Administration, 
Scoring and Interpretation (2nd ed.), by J. Butcher, J. Graham, Y. Ben-Porath, A. Tellegen, W. 
Dahlstrom, and B. Kaemmer, 2001. Copyright 2001 by the Regents of the University of Minnesota. 
Adapted and reprinted by permission. 
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Validity research on the PSY-5 scales 


Discrimination studies 


Several validity studies have focused on the issue of discriminability. For example, 
in a sample of 171 participants from an inpatient/outpatient psychiatric hospital, 
McNulty, Harkness, and Wright (1994) compared the mean PSY-5 scale scores of 
two clinical subgroups to those of the balance of the sample. Mean PSY-5 scale sco- 
res of participants whose diagnoses indicated the presence of psychotic features we- 
re significantly different from the scales' scores in the remainder of the sample only 
for the Psychoticism scale. In comparison to participants diagnosed with a primarily 
depression-related disorder, PSY-5 scale scores for the remainder of the participants 
were significantly lower on the Negative Emotionality/Neuroticism and Introversi- 
on/Low Emotionality scales. These results were consistent with each of the PSY-5 
constructs, as the lack of significant differences between non-diagnosis-related sca- 
les, along with the mentioned differences in each comparison, were construct rele- 
vant. 

Harkness, Sprio et al. (1995) examined the links between PSY-5 measured indi- 
vidual differences and alcohol consumption and self-defined alcohol problems in 
three groups identified from the Boston VA Normative Aging Study sample. In this 
study, high volume drinkers were considerably more likely to evidence higher levels 
of both Disconstraint and Introversion/Low Positive Emotionality than low volume 
drinkers. Furthermore, high volume drinkers who had experienced drinking related 
problems during the course of their lives were considerably more likely to evidence 
higher levels of Negative Emotionality/Neuroticism than those high volume drinkers 
who had not experienced such problems. Given these results, Harkness, Spiro er al. 
speculated on the role alcohol consumption may play as a mechanism for enhancing 
or adapting to one’s trait related personality characteristics, and how knowledge of a 
client’s standing on these traits can play a role in understanding and intervening with 
various drinking patterns. 

In a European sample comprised of both psychiatric inpatients and outpatients, 
Egger, Derksen, and DeMey (1997) explored how well the PSY-5 scales discrimi- 
nated between three common profile types, identified via cluster analysis of the 
MMPI-2 clinical scales. Their analyses clearly showed that the pattern of PSY-5 
scale elevations differentiated between the three groups in construct relevant ways. 
For example, the first cluster was characterized primarily by elevations on scales 8 
and 9. The PSY-5 profile for this cluster evidenced an elevation only on the Psycho- 
ticism scale. The cluster dominated by elevations on scales 2 and 7 evidenced an 
elevation on the PSY-5 Introversion/Low Positive Emotionality scale only. The third 
cluster, characterized by elevations on scales 8, 7, 6, 4, and 2, showed PSY-5 eleva- 
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tions on the Psychoticism, Negative Emotionality/Neuroticism, and Introversi- 
on/Low Positive Emotionality scales. 

Rouse, Butcher, and Miller (1999) showed how the PSY-5 Disconstraint scale 
was able to distinguish between substance abusers who had been classified as non- 
abusers (false negatives), and non-substance abusers who had been correctly classi- 
fied (true negatives), based on optimum cutoffs on the MAC-R, AAS, and APS sca- 
les. Their results indicated that substance abusers miss-classified as non-abusers 
obtained higher PSY-5 Disconstraint scale scores than correctly classified non- 
abusers. 

In a study by Bagby, Buis, Nicholson, Parikh, and Bacciochi (1999), the ability of 
the MMPI-2 clinical, content, and PSY-5 scales to differentiate patients with major 
depression, schizophrenia, and bipolar disorder, depressed was evaluated. A compa- 
rison of PSY-5 scale mean scores across the three patient groups showed that the 
Disconstraint scale differentiated between the major depression and bipolar depres- 
sed groups, and the Introversion/Low Positive Emotionality scale differentiated bet- 
ween the major depressed and schizophrenia groups. Subsequent stepwise regressi- 
ons showed that the PSY-5 Disconstraint scale was the most effective at distinguis- 
hing patients with bipolar depression from patients with major depression. Bagby et 
al. concluded that both the content and PSY-5 scales provided important information 
in differential diagnosis. 

As indicated earlier, the strongest PSY-5 scale intercorrelation occurs between 
the Psychoticism and Negative Emotionality/Neuroticism scales. Harkness, Mc- 
Nulty, Finger, Arbisi, and Ben-Porath (1999) introduced the idea of item pleiometri- 
city as an explanation for the magnitude of this relation. Pleiometricity refers to 
items that measure more than one construct. The authors hypothesized that psychoti- 
cism items are intrinsically pleiometric: They intrinsically involve detecting a pro- 
blem in oneself. Thus they measure negative emotionality as well as psychoticism. 
To test this idea, the authors examined how the base-rate of psychotic patients in a 
sample influenced the factor structure of the items jointly comprising the PSY-5 
Psychoticism and Negative Emotionality/Neuroticism scales. Confirmatory factor 
analysis using the MMPI-2 normative sample, a sample with a low base rate of psy- 
chotic problems, suggested that the items from the two scales were consistent with a 
single latent factor. In a normal sample, psychoticism items tended to function as 
high-psychometric difficulty indicators of negative emotionality. In a clinical sample 
that evidenced a higher base rate of psychotic patients, two factors were required for 
a plausible solution. Psychoticism items could be pleiometric in this sample, tapping 
both psychoticism, and the negative emotionality tendency to see problems in one- 
self. These results offer an explanation for the lack of а separate psychoticism factor 
in models based on normal personality functioning, such as the five factor model. In 
a test of the discriminability of the Psychoticism and Negative Emotionali- 
ty/Neuroticism scales, Harkness, McNulty, Finger et al. multiply regressed both 
scales with the count of psychotic symptoms in the clinical sample. While the two 
scales have a positive intercorrelation, the Psychoticism scale had a significant posi- 
tive beta weight with the number of psychotic symptoms, while the Negative Emoti- 
onality/Neuroticism scale had a significant negative beta weight with the number of 
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psychotic symptoms. The Negative Emotionality/Neuroticism scale acted, in effect, 
as a suppressor variable on the Psychoticism scale; knowledge of a person's stan- 
ding on Negative Emotionality or Neuroticism allows the clinician to suppress nuis- 
ance variance in the subject's self-report of psychotic phenomena. 


Relations with other individual differences models 


A series of studies had as their focus the relations between the PSY-5 scales and 
those of other models. Harkness, McNulty, and Ben-Porath (1995) presented an 
analysis of the relations between the PSY-5 scales and Tellegen's Multidimensional 
Personality Questionnaire (MPQ: manuscript in preparation) superfactor scales in a 
sample of 838 college students. PSY-5 Negative Emotionality/Neuroticism, Dis- 
constraint, and Introversion/Low Positive Emotionality correlated as expected with 
MPQ superfactors of the same name: Negative Emotionality, Constraint, and Positi- 
ve Emotionality, respectively. 

Participants in the Boston УА Normative Aging Study were again the source for 
a study presented by Harkness, Spiro et al. (1995). The authors correlated the PSY-5 
scales with a modified version of the Big Five factors based on Goldberg's 1992 
adjective markers. The modified Big Five component scores showed the strongest 
convergent relations with PSY-5 Negative Emotionality/Neuroticism and Introver- 
sion/Low Positive Emotionality, consistent with Harkness and McNulty's (1994) 
speculations. While PSY-5 Aggressiveness, Disconstraint and Psychoticism, respec- 
tively, had no strong correlations with the Big Five Agreeableness, Conscientious- 
ness, and Openness components, of interest was the correlation between PSY-5 Ag- 
gressiveness and Big Five Extraversion (r = .39). Upon investigation, Harkness, 
Spiro et al. found strong relations between the PSY-5 Aggressiveness scale and the 
bold, assertive, demanding, and timid (-) Goldberg extraversion markers, each con- 
sistent with the Aggressiveness construct. In a final set of analyses, Harkness, Spiro 
et al. correlated the PSY-5 scale scores with those of the Sixteen Personality Factor 
Questionnaire (16PF; Cattell, Eber, & Tatsuoka, 1970) administered approximately 
25 years earlier. There was strong support for convergent and divergent construct 
validity, and the length of time between the administration of the instruments provi- 
ded additional support for the stability of the PSY-5 constructs. 

Trull et al. (1995) examined the relations between PSY-5, NEO-PI, and NEO-PI- 
R scores in community and clinical samples. Of particular interest are the results 
from the clinical sample (N = 56). As Harkness and McNulty (1994) predicted, the 
strongest relations between the PSY-5 and NEO-PI based FFM domain scores oc- 
curred between PSY-5 Negative Emotionality/Neuroticism and NEO-PI Neuroti- 
cism, and between РЅҮ-5 Introversion/Low Positive Emotionality and NEO-PI Ex- 
traversion. PSY-5 Aggressiveness evidenced a moderate correlation with NEO-PI 
Agreeableness, also consistent with Harkness and McNulty’s expectation. Finally, 
PSY-5 Disconstraint showed a moderate correlation with NEO-PI Conscientious- 
ness, and PSY-5 Psychoticism was moderately correlated with NEO-PI Openness to 
Experience. 
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McNulty, Harkness, and Ben-Porath (1998) explored the relations between four 
individual differences models of personality in a college sample of 291 introductory 
psychology students: the PSY-5, Costa and McCrae's (1992) NEO Personality In- 
ventory-Revised (NEO-PI-R), Tellegen's Multidimensional Personality Question- 
naire (MPQ; manuscript in preparation), and the Alternative Five model (ZKPQ-III; 
Zuckerman, Kuhlman, Joireman, Teta, & Kraft, 1993). Scales at the five-factor level 
from each model were submitted to a multiple battery factor analysis, a factor ana- 
lytic technique that minimizes the impact of battery specific variance on the resul- 
tant factor solution (cf., Browne, 1980; Millsap, 1995). Extraction and oblique rota- 
tion of a five factor solution provided the best representation of the variance com- 
mon across the four models. These factors were labeled Neuroticism, Communal 
Positive Emotionality, Aggressiveness, Disconstraint, and Agentic Positive Emotio- 
nality. 

Regarding the PSY-5 model, the Negative Emotionality/Neuroticism and Aggres- 
siveness scales were strongly and uniquely related to the Neuroticism and Aggressi- 
veness common factors, respectively. PSY-5 Introversion loaded primarily on the 
Communal Positive Emotionality factor, but had a secondary loading on the Agentic 
Positive Emotionality factor as well. This pattern suggested that the PSY-5 Introver- 
sion scale taps both aspects of the broad Positive Emotionality domain. PSY-5 Dis- 
constraint showed a primary loading on the Disconstraint factor and a secondary 
loading on the Aggressiveness factor, suggesting that this scale reflects an aggressi- 
ve impulsivity, a construct similar to that hypothesized by Siever and Davis (1991) 
as particularly relevant to the domain of personality disorders. Finally, PSY-5 Psy- 
choticism loaded moderately on the Neuroticism factor. 

Voelker and Nichols (1999) explored the relations between several MMPI-2 ba- 
sed scales, including the PSY-5, and the Factor 1, Factor 2, and Total scores of the 
Hare Psychopathy Checklist-Revised (PCL-R; Hare, 1991). The item content of 
PCL-R Factor | includes superficial charm, grandiose sense of self-worth, manipu- 
lative, and lack of empathy, each related to the PSY-5 Aggressiveness construct. For 
PCL-R Factor 2, the item content includes boredom proneness, poor behavior con- 
trol, impulsivity, juvenile delinquency, and behavior problems, each related to the 
PSY-5 Disconstraint construct. In a sample of 100 correctional inmates, PSY-5 Ag- 
gressiveness was the strongest correlate of PCL-R Factor 1 (r = .30), while PSY-5 
Disconstraint was the strongest correlate of PCL-R Factor 2 (г = .34). Furthermore, 
PSY-5 Aggressiveness was essentially uncorrelated with PCL-R Factor 2 (r = .16), 
PSY-S Disconstraint was uncorrelated with PCL-R Factor 1 (r = .10), with only the 
PSY-5 Introversion/Low Positive Emotionality scale obtaining an additional mode- 
rate correlation with Factor 1 (r 2 -.24). Voelker and Nichols' results confirmed the 
convergent and discriminant aspects of both the Aggressiveness and Disconstraint 
PSY-5 constructs. 
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Relations between the PSY-5 scales and extratest criteria 


Prediction of relevant extratest characteristics (e.g., diagnostic symptoms, persona- 
lity characteristics) is the focus of a third set of validity studies. Trull et al. (1995) 
examined the relations between PSY-5, NEO-PI, and NEO-PI-R scores in commu- 
nity and clinical samples. In a final analysis, the validity of the PSY-5 scale scores 
was assessed in predicting symptom counts for the DSM-III-R personality disorders. 
Two instruments, the Structured Interview for DSM-III-R Personality — Revised 
(SIDP-R; Pfohl, Blum, Zimmerman, & Stangl, 1989) and the Personality Diagnostic 
Questionnaire — Revised (PDQ-R; Hyler & Reider, 1987), were administered to the 
clinical sample participants (М = 56). The PSY-5 scales were significant predictors 
of the symptom counts for all 13 of the structured interview (SIDP-R) PDs (e К? 

= .29, range = .21 to .51), and 12 of the 13 self-report (PDQ-R) PDs (mean Қ? = 31, 
range = .19 to .64). 

McNulty, Ben-Porath and Watt (1997) analyzed the incremental validity of the 
PSY-5 and Five Factor models in predicting SCID-II personality disorder sympto- 
matology in a sample of substance abusers participating in a VA hospital addiction 
recovery program. In their summary of a series of regression analyses where the 
PSY-5 and NEO-PI-R domain scales were alternately entered first, the mean increa- 
se in А? for the PSY-5 scales was .085 (range, .033 to .206), compared with .024 
(range = .000 to .047) for the NEO-PI-R scales. An important finding was the con- 
tribution of the PSY-5 Psychoticism scale to the prediction of SCID-II schizotypal 
personality disorder symptomatology. Consistent with Harkness and McNulty's 
(1994) speculations, the Psychoticism scale was the key to distinguishing between 
the schizoid and schizotypal personality disorder symptom scores. 

Rouse (1997) studied the validity of the PSY-5 scales using data from the Min- 
nesota Psychotherapy Assessment Project. Rouse first reported that the correlations 
between the PSY-5 scales and the MMPI-2 clinical, content, and supplementary 
scales were meaningful and relevant to the PSY-5 constructs. In a second set of 
analyses, Rouse utilized regression analyses to predict individual PSY-5 scale scores 
from symptom indices created from a priori judgements of construct relevant 
symptoms. From rater-identified symptoms relevant to each PSY-5 construct, an 
index was created for each PSY-5 scale. Each PSY-5 scale was then regressed on its 
associated symptom index. Multiple Rs ranged from .22 for the Psychoticism scale 
to .41 for the Disconstraint and Introversion/Low Positive Emotionality scales. As 
the multiple Rs reflected the relations between informant and self-report data, Rouse 
argued that these multiple Rs provided a "stress test", or lower bound, of the PSY-5 
scales’ construct validity. Rouse concluded that the significant correlations between 
the non-test measures (symptom indices) and the PSY-S5 scales, and the pattern of 
correlations between the PSY-5 scales and the MMPI-2 clinical, content, and sup- 
plementary scales, supported the construct validity of the PSY-5 scales, and demon- 
strated the utility of the PSY-5 scales in understanding inpatient and outpatient psy- 
chopathology. 
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Perspectives on PSY-5 interpretation 


Harkness (in press), Harkness and Lilienfeld (1997), Harkness and McNulty (in 
press), and Siever and Frucht (1997) have emphasized the relevance and importance 
of an individual differences perspective to understanding psychopathology, and have 
offered suggestions for incorporating individual differences models in assessment 
and treatment. With respect to the PSY-5, Harkness, Royer, and Gill (1996) presen- 
ted an approach for organizing MMPI-2 assessment feedback using the PSY-5 sca- 
les. This individual differences approach, modeled on that developed by Stephen 
Finn (1996), was considered extremely or mostly accurate and helpful by over 95 
per cent of the participants from an outpatient university counseling center sample. 

PSY-5 scale interpretation issues have been addressed in two studies. In the first, 
Rouse, Finger, and Butcher (1999) applied an item response theory (IRT) approach 
to examining the psychometric properties of the PSY-5 scales. Full-information 
factor analyses confirmed that each of the PSY-5 scales is adequately unidimensio- 
nal for IRT analysis. Test information functions for the PSY-5 scales indicated two 
different patterns. For the PSY-5 Aggressiveness, Psychoticism, and Negative Emo- 
tionality/Neuroticism scales, the greatest discrimination between test respondents 
occurred at the high end of the trait dimension. For the Aggressiveness and Negative 
Emotionality/Neuroticism scales, while respondents with low to normal range scores 
can be modestly discriminated from each other, scale scores of 60 or higher eviden- 
ce much greater differential trait levels with each succeeding increase in score. The 
Psychoticism scale evidences little discriminability for persons scoring lower than 
65. However, each change in T score above 65 offers a great deal of discriminant 
information. The second test information function pattern pertains to the PSY-5 Dis- 
constraint and Introversion scales. For both of these scales, test discrimination is 
distributed bilaterally and broadly across trait levels. 

Harkness ег al. (in press) reviewed the correlations between extratest variables 
and each of the PSY-5 scales from a large outpatient community mental health sam- 
ple (detailed information on the study site, sample characteristics, instruments, and 
procedures are found in Graham, Ben-Porath, & McNulty, 1999). Available extratest 
data included variables from a standardized intake interview and mental status exam, 
scale scores from the self-report SCL-90-R, responses to a 188-item patient descrip- 
tion form (PDF) completed by the client's therapist following the third therapy ses- 
sion, and scores for 25 empirically derived scales that tap the important content do- 
mains reflected by the PDF items. The analyses were conducted by gender, and only 
with participants obtaining a valid MMPI-2 (N = 410 men, N = 610 women). 

The pattern of Aggressiveness scale correlates reflected aggressive, physically 
abusive, and antisocial behavior in both men and women. Passive behavior in inter- 
personal relations was associated with low scores in both genders. For women, there 
were several additional construct relevant correlates, such as being perceived as nar- 
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cissistic, assertive, egocentric, grandiose, selfish, overbearing in relationships, and 
over-evaluating their own worth. 

In this study site, clients with severe psychopathology were referred elsewhere. 
Consequently, there was a noticeable lack of such pathology in the study sample. 
However, self-report paranoid ideation and psychoticism scale correlations from the 
SCL-90-R were the strongest for the PSY-5 Psychoticism scale. Otherwise, the pat- 
tern of correlations suggested generally low functioning, isolation, depression, and 
poor coping abilities, in both interpersonal and work-related situations. Only for 
women did evidence of hallucinations at intake correlate with Psychoticism. 

The correlates of the PSY-5 Disconstraint scale suggested a general tendency to 
act out and behave impulsively. Substance abuse was a prominent feature of higher 
scores, as well as being in trouble with the law. Many of the aggression related cor- 
relates of the PSY-5 Aggressiveness scale were shared with the Disconstraint scale. 
Several Disconstraint relevant correlates were evidenced only by women, including 
procrastination, deception, evasiveness, failure to complete projects, insensitivity, 
manipulation, tending to intellectualize or rationalize, and feigning remorse when 
their behavior lands them in trouble. 

There was a substantial degree of overlap in the correlates of the Negative Emo- 
tionality/Neuroticism and Introversion scales, primarily related to depression and 
related symptoms, and to anxiety. This is not unexpected, given the high degree of 
comorbidity between these two symptom patterns in clinical populations, as well as 
the magnitude of the correlation between these two constructs that is typically 
found. Anxiety did, however, show a generally stronger relation with the Negative 
Emotionality/Neuroticism scale, while depression was somewhat more strongly re- 
lated to the Introversion scale. Correlates unique to Negative Emotionali- 
ty/Neuroticism included histrionic behavior, low frustration tolerance, and being 
over-reactive. Correlates indicative of a lack of achievement orientation were relati- 
vely unique to the Introversion scale, as were being perceived by one’s therapist as 
introverted, shy, socially awkward, and not sexually adjusted. These correlates of 
the Introversion scale indicated that it is measuring both of the agentic and commu- 
nal domains included in the broader Positive Emotionality construct. 


General administration and scoring recommendations 


Consistent with the general administration guidelines provided in Butcher et al. 
(2001), we recommend administration of the entire MMPI-2 questionnaire. This will 
provide the test user with information concerning the respondent’s test-taking ap- 
proach as well as scores on the currently available clinical, content, and supplemen- 
tary scales in addition to the PSY-5 scales. A review of the respondent’s test-taking 
approach, as indicated earlier, was one of the primary reasons for developing the 
MMPI-2-based PSY-5 scales. Furthermore, information available from the other 
MMPI-2 scales can provide useful insights into an individual differences interpreta- 
tion approach available through the PSY-5 scales. While an abbreviated administra- 
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tion of the first 370 MMPI-2 items provides sufficient information for computing the 
original validity and clinical scale scores, the items comprising the PSY-5 scales are 
spread throughout the MMPI-2 item pool. Consequently, such an abbreviated admi- 
nistration would not provide all of the responses required for calculation of PSY-5 
raw scale scores. 

With the availability of gender-based mean and standard deviation data for the 
PSY-5 scales (Butcher et al., 2001), linear T scores can be manually calculated from 
PSY-5 scale raw scores. Recently, the MMPI-2-based PSY-5 scales were included 
in the computerized scoring and interpretation services provided by National Com- 
puter Systems (NCS: P.O. Вох 1416, Minneapolis, MN 55440). 


Clinical interpretation of the PSY-5 scales 


The PSY-5 scales provide an overview of major personality trait features for the 
MMPI-2 test respondent. Interpretive approaches such as those suggested by But- 
cher and Williams (2000), Finn (1996), and Graham (2000) consider such a formu- 
lation to be a central ingredient in the overall interpretation. General interpretive 
statements for the PSY-5 scales are provided in Harkness er al. (in press) and in 
Butcher er al. (2001). We provide a summary in this chapter. The interpretation of 
high and low scale scores is guided by test information functions computed from the 
MMPI-2 normative sample (Rouse, Finger, & Butcher, 1999). 

Elevated Aggressiveness scale scores (T scores greater than 65) suggest the ten- 
dency toward offensive aggression, toward using aggression in the pursuit of one's 
goals. A high scorer enjoys dominating others and derives satisfaction from the sen- 
se of power associated with aggressive displays. In the outpatient sample, high Ag- 
gressiveness scale scores were associated with a history of being physically abusive 
and therapist ratings as having aggressive and antisocial features. Men were more 
likely to have histories of committing domestic violence while women were more 
likely to have been arrested. Finally. women with high Aggressiveness scale scores 
were rated by their therapists as extroverted. The test information function for the 
Aggressiveness scale indicates that low scores should not be interpreted. 

Elevated Psychoticism scale scores (T scores above 65) suggest difficulty in 
maintaining contact with consensually validated reality. Eccentric thought and beha- 
vior patterns, perceptual distortions, and difficulty in making appropriate cause and 
effect attributions (particularly in interpersonal situations) are the primary characte- 
ristics of high scorers. In an inpatient sample (Harkness ег al., 1999), persons with 
high scores were more likely to have admission notations of psychosis, paranoid 
suspiciousness, ideas of reference, loosening of associations, hallucinations, or flight 
of ideas. In an outpatient sample, with lower base-rates of psychotic phenomena, 
elevations were associated with lower functioning at admission and having few or 
no friends. Both outpatient men and women were characterized as depressed on 
mental status examination, and were rated by their therapists as not being achieve- 
ment oriented. A sad mood at admission and therapist-rated anxiousness and depres- 
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sion were characteristics of outpatient men with high Psychoticism scale scores. 
Women with high Psychoticism scale scores were more likely to report hallucinati- 
ons at admission. Low Psychoticism scores should not be interpreted. 

High Disconstraint scale scores (T scores greater than 65) suggest that the res- 
pondent is not bound by traditional moral values and dictates, engages in risky be- 
havior, and evidences a general lack of control in responding to internal or external 
situational contexts. Elevated scores in the outpatient sample were associated with a 
history of being arrested, and histories of alcohol, cocaine, and marijuana abuse in 
the outpatient sample. In addition, high Disconstraint scorers were rated by their 
therapists as aggressive and antisocial. Men had histories of committing domestic 
violence, while women were characterized as somewhat achievement oriented by 
their therapists. Low scores on Disconstraint (T scores less than 40) suggest a redu- 
ced tendency for risk taking, greater self-control, and reduced impulsivity. Low sco- 
rers evidence greater boredom tolerance, a tendency to be a rule follower, and a 
slight tendency to prefer romantic partners with similarly constrained personality 
patterns. 

Elevated Negative Emotionality/Neuroticism scale scores (T scores greater than 
65) suggest a tendency to attend to danger signals. High scorers view events and 
recollections through a filter with a bias toward catching signals of danger and pro- 
blems. They tend to view the world as more threatening and dangerous than others 
do. The affect of high scorers tends to be dominated by worry, tension, and a pro- 
pensity to experience guilt about one's actions. In the outpatient sample, elevated 
scores were associated with diagnoses of depression or dysthymia, low functioning, 
and the tendency to have few or no friends. Therapist ratings and results of mental 
status exams indicated anxiety, depression, and a sad mood state characterized high 
scorers as well. High scoring men were more likely to have committed domestic 
violence, characteristic with maintaining a focus on the flaws, problems, and irritati- 
ons in one's spouse, life, and future prospects. Consistent with links reviewed by 
Watson and Pennebacker (1989), complaints of somatic symptoms were more com- 
mon among high Negative Emotionality/Neuroticism scale scorers. Evidence of 
histories of alcohol abuse and therapist ratings as pessimistic and lacking achieve- 
ment orientation characterized high scoring women. Low Negative Emotionality/ 
Neuroticism scale scores should not be interpreted. 

Men and women with high scale scores (T scores greater than 65) on Introversi- 
on/Low Positive Emotionality have a reduced capacity to experience positive emoti- 
ons, and are much more comfortable engaging in solitary activities or interpersonal 
activities with only one or a few friends. High scores on the Introversion/Low Posi- 
tive Emotionality scale also suggest a lower achievement orientation. Increased rates 
of dysthymia and depression, and feelings of depression and sadness during com- 
pletion of a mental status exam were evident in the outpatient sample for elevated 
scores. Therapists rated high scorers as having a low achievement orientation, and as 
anxious, depressed, introverted, pessimistic, and complaining of somatic symptoms. 
Antidepressant medications were more likely to be prescribed to high scoring wo- 
men, who were also seen as having few or no friends. Persons with low scale scores 
(T scores less than 40) exhibited an Extroverted/High Positive Emotionality pattern. 
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Such scores suggest a greater capacity to experience pleasure and joy, and more so- 
cial interests and energy. 


А Case example 


Figure 1 shows the PSY-5 profile for a 26 year-old, single, Caucasian woman seen 
at an outpatient clinic. The validity scales were all within acceptable limits, sugges- 
ting that the client completed the MMPI-2 in a straightforward manner. Elevations 
on both of the PSY-5 Disconstraint and Psychoticism scales, the low Introver- 
sion/Low Positive Emotionality scale score, and the absence of clinically relevant 
scores for the Aggressiveness and Negative Emotionality/Neuroticism scales provi- 
de the organizing framework for understanding the client's prominent personality 
characteristics. 

The client presented with interpersonal problems, primarily in her family, and al- 
cohol and drug abuse. During the intake interview she acknowledged having pre- 
viously received individual and group psychotherapy, an arrest record, a lifetime 
history of alcohol and marijuana abuse, and having been physically and sexually 
abused. She did not complete high school but had obtained her GED. During the 
intake interview, the client was cooperative and displayed a full range of affect, in- 
cluding happiness, anxiety, and sadness. Following the client's third therapy session, 
her therapist (who was blind to the MMPI-2 results) rated her on a number of perso- 
nality characteristics. She was rated as ignoring or intellectualizing problems and 
having difficulty trusting others, tending to keep them at a distance. The therapist 
concluded that interpersonal relationships tend to be stormy, particularly within her 
family. She resents family members and feels that her family lacks love. The thera- 
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Figure 1. PSY-5 scale scores for a 26year-old female outpatient 
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pist further concluded that although she sees her interpersonal relationships as unful- 
filling, she has a very strong need to be with others. The client was rated as mode- 
rately aggressive, angry, critical, and argumentative at times during therapy. Howe- 
ver, her therapist indicated an absence of deep emotions. 

From a PSY-5 individual differences perspective, one key issue concerns the ba- 
sis of the client's interpersonal difficulties and the aggressive displays evidenced in 
therapy. Both PSY-5 Aggressiveness and Negative Emotionality/Neuroticism are 
within normal limits, suggesting that she does not attempt to dominate others nor is 
it likely that trait-like anxiety is the typical genesis for such behavior. Her therapist 
indicated that she has a strong need to be with other people, and her PSY-5 Introver- 
sion/Low Positive Emotionality score suggests that she enjoys the company of 
others and has the capacity to experience positive emotions. However, her ability to 
accurately interpret her social, interpersonal world may be compromised. 

Thus the Psychoticism scale that may provide an essential clue. This client may 
have difficulty understanding the actions of others and may feel alienated as a result. 
Furthermore, she may interpret others’ behavior as threatening and react with ag- 
gressive displays. This pattern may then be amplified by her inability to inhibit her 
responses, suggested by the elevated Disconstraint scale score. This conceptualizati- 
on of the case suggests a different formulation and therapeutic approach compared 
to one focused solely on the client's aggression. 

A second key issue concerns the client's substance abuse problem. Her elevated 
PSY-5 Disconstraint score points to difficulties inhibiting impulsive behavior, and is 
often found in persons with substance abuse problems. Her low Introversion/Low 
Positive Emotionality score, indicative of the Extroverted pattern, may also provide 
clues to the dynamics of her substance abuse. The client enjoys the company of 
others and likely seeks out interpersonal relationships. However, consistent with her 
high Psychoticism score, she may also have difficulty interpreting others' social 
cues; she feel alienated from others and view others’ behavior as threatening. It is 
possible that alcohol and marijuana use dampen the sense of alienation and threat, 
helping the client to enjoy her interpersonal relations more fully. For both of these 
key therapeutic issues, PSY-5 Psychoticism provides information not readily availa- 
ble from other models. 


Conclusion 


The PSY-5 model was derived from indicators of both normal and abnormal perso- 
nality. Using both types of markers resulted in PSY-5 constructs that resemble parts 
of the FEM. However, PSY-5 constructs differ in important ways from the FFM. 
Scales to measure the PSY-5 constructs were developed using the MMPI-2 item 
pool. The MMPI-2 PSY-5 scales evidence adequate psychometric properties as well 
as convergent and discriminant relations with clinically relevant extra-test data. 
Comparisons of the MMPI-2 based PSY-5 scales with other models, including the 
FFM, indicates common Negative Emotionality/Neuroticism and Introversion/Low 
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Positive Emotionality measures. PSY-5 Aggressiveness is also similar to FFM 
Agreeableness. There is less similarity between PSY-5 Disconstraint and FFM Con- 
Scientiousness. PSY-5 Psychoticism is quite different from any of the FFM con- 
structs and provides a unique contribution to understanding personality functioning. 

| From a clinical perspective, a broad overview of major personality trait features 
Is an important component of adequate case conceptualization and treatment plan- 
ning (e.g., Harkness & Lilienfeld, 1997). The similarity and differences between the 
PSY-5 and FFM outlined in this chapter can help the clinician determine the approp- 
riateness of either or both models in addressing important clinical issues. 
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Chapter 20 


The Professional Personality Questionnaire 


Paul Barrett 


Introduction 


The Professional Personality Questionnaire (PPQ) was created by the late Paul Kline 
and his research student Sharon Lapham, in the late 1980s, with the first publication 
of its factor pattern and scale correlations in 1991. It is a brief questionnaire measu- 
ring five broad scales that are designed to assess the Big Five factors of McCrae and 
Costa (1987) in the work environment. The PPQ scales are named Insecurity versus 
Confidence (Big Five Anxiety). Conscientiousness versus Carelessness, Introversi- 
on-Extraversion. Tough versus Tender-Minded (Big Five Agreeableness), and Con- 
ventional versus Unconventional (Big Five Openness to Experience). A sixth Invali- 
dity V scale is scored as a measure of inaccuracy/inconsistency of responding. The 
PPQ was constructed to provide a brief but reliable measure of these factors, suitable 
for use in Industrial/Organizational (ИО) personnel selection and staff development 
screening in the UK. An initial item pool of 100 items was generated in which all 
references to clinical and psychiatric terminology and symptoms were removed. The 
items were all configured to be "work-relevant" and face-valid. Guiding the item 
generation process was the consistent aim to acquire measures of the Big Five fac- 
tors, but with items that were highly face-valid within the work context. Many more 
than the final 100 "pilot" items were generated by both test authors, to be followed 
by informal semantic analysis to remove obvious duplicates or ambiguous items. 
The initial set of 100 pilot items was then administered to 1472 university students. 
Each item uses a binary “yes/no” response format, and is scored 1/0 in the targeted 
direction. From the initial factor and item analyses, a subset of 68 items was selec- 
ted; these items possessed desirable psychometric indices of reliability and validity. 
Although four papers were published by Kline and Lapham (1991a, 19916, 19923, 
1992b), and one by Kline and Barrett (1994), the test itself was never published, 
marketed, or sold. It was originally sponsored for development by Personality Sys- 
tems Ltd., a UK test publishing company; but the company became insolvent in 
early 1990 and subsequently ceased trading. The test thus remained a research in- 


Big Five Assessment, edited by B. De Raad & M. Perugini. © 2002, Hogrefe & Huber Publishers. 


458 Big Five Assessment 


strument that is only now being made available to the wider test community (with 
the agreement of Paul Kline's widow). 


The questionnaire and score key 


Appendices A and B provide a complete listing of the final 68 test items, including 
the test instructions, along with the score key for the test. As can be seen from the 
item listing, nearly all the items focus on the work-environment or aspects of work. 
About half the items are keyed “Yes”, and acquiescence was checked in the item 
analyses and factor analyses by showing that there were no differences in factor loa- 
dings between positive and negative items. The average completion time for the test 
is about 10 minutes. 


The psychometrics of the PPQ 


Factor analysis 


Kline and Lapham (19912) administered the 100 item pilot test to 1,472 UK stu- 
dents, 906 females and 566 males in four universities. A principal components ana- 
lysis was undertaken on the joint gender dataset, as according to Kline and Lapham 
(19912) there were no sex differences in separate sample factors. Five factors were 
extracted using the scree test, and subsequently rotated using direct oblimin rotation 
with delta = 0.0. The complete table of factor loadings is given in this paper, albeit 
referenced to the 100-item questionnaire. The significant feature is that the 68 items 
chosen to form the РРО do load their respective factors > |.30|. although there is 
some limited item complexity (but the targeted loading is always higher than any 
other loading on a different factor). 


Table 1. Scale alpha reliability coefficients for the 68 item PPQ, based upon a sample of 1,472 
mixed-gender UK university students and 253 mixed-gender UK adult volunteers 


SERIE Number of Student Adult 
Items Alphas Alphas 
Insecurity (Anxiety) 15 .76 79 
Conscientiousness 13 .78 .79 
Introversion 13 .76 .80 
Tender Minded (Agreeableness) 12 .70 272; 


Unconventional (Openness to Experience) 15 .73 277. 
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Alpha Reliabilities 


Within Kline and Lapham (19913), the scale alphas computed using a total sample 
size of 1,472, are reported. They are shown here in the "Student Alphas" column of 
Table 1. 

As can be seen, all alphas are greater than or equal to .70, with the Tough-Minded 
scale having the lowest alpha coefficient of .70. A second sample of data also provi- 
ded further evidence of alpha reliabilities of greater than .70 for all scales. This evi- 
dence was drawn from 253 adult volunteers (186 females and 67 males) who also 
took part in a series of psychophysiological and chronometric tasks at the Biosignal 
laboratory in the Institute of Psychiatry. The total sample originally provided data 
for a joint-questionnaire multidimensional scaling analysis that was reported in 
Kline and Barrett (1994). Although this latter paper did not report the scale alphas, it 
is worthwhile reporting them in Table 1 for comparative purposes. As can be seen, 
these alphas are slightly higher than the student sample — although not appreciably 
different. To date. there is no reported evidence of test-retest reliability of the five 
scales. 

With regard to the sampling characteristics of the Biosignal laboratory sample, 
the mean and standard deviation age of the female sample was 35 and 11.6 years, 
respectively, with a range of 16 to 59 years. For the 67 males, the mean was 34 years 
with a standard deviation of 11.5 years, with the same age range as the females. The 
adult volunteers were enlisted into the Biosignal Laboratory participant pool using 
advertising in local newspapers, local employment offices, and within surrounding 
adult-education colleges in the South London geographical area. Testing within the 
laboratory always took place during working hours, hence most of the female sam- 
ple were housewives, part-time employees, and unemployed individuals. The male 
sample was primarily composed of part-time and unemployed individuals. Although 
educational histories and biographical information was not acquired from the indivi- 
duals comprising the laboratory sample, IQ scores are available. These were acqui- 
red using the Jackson Multidimensional Aptitude Battery (1984), a group- 
administered version of the Wechsler Adult Intelligence Scales - Revised (1981). 
Table 2 below provides the descriptive statistics of the three summary IQ variables 
for this sample. 


Table 2. Summary Multidimensional Aptitude Battery IQ variable statistics computed over the 
sample of 253 mixed-gender UK adult volunteers. 


Verbal IQ. Performance IQ Full-Scale IQ. 
Mean (104 1082 109.5 
Median mu 110 111 
Standard Deviation 12.34 13.72 1211 
Minimum Value 72 63 67 


Maximum Value 137 137 136 
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Table 3. Scale score intercorrelations of the PPQ, based upon a sample of 1,472 mixed gender UK 
university students. The figures in brackets are the scale correlations computed using the sample of 
253 volunteer adults from the Biosignal Laboratory panel. 


Scale Conscientious Introversion Tender Minded Unconventional 
Insecurity 2059 (310) .38  (.46) .15 (226) -.29 (-.23) 
Conscientious .01  (.16) -.25 (-.11) -.42 (-.26) 
Introversion .35  (.39) -23 (-.28) 
Tender-Minded .13  (.00) 


Unconventional 


It is clear that the IQ scores of this sample are above average (given a normative 
mean of 100 and standard deviation of 15), with 17 per cent of individuals with Full- 
Scale IQs less than 100. 


Scale Inter-Correlations 


The scale-score correlations between the five personality scales, which are also re- 
ported in Kline and Lapham (19912), are provided in Table 3. 

Within this table, both the student and Biosignal Laboratory sample inter- 
correlations are reported. These correlations indicate that certain scales are definitely 
not independent from one another. If we correct the highest student sample scale 
intercorrelations for unreliability of measurement using the alphas from the М = 
1,472 student sample, it turns out that the observed correlation in the student sample 
data between Conscientiousness and Unconventionality of .42 then becomes .55. 
The observed correlation of .29 between Insecurity and Unconventionality then 
becomes .38. That between Introversion and Tender Mindedness (.35) then becomes 
.48, and the correlation of .38 between Insecurity and Introversion becomes .50. 
Whilst the corrected correlations are not observable, they do indicate that taking into 
account measurement error as defined within classical test theory, there is reason- 
ably substantive correlation between certain scales of the test. 


Table 4. Scale means and standard deviations for the PPQ personality and Invalidity V scale, 
computed using the М = 1,472 UK mixed gender university student sample. The figures in brackets 
are the values computed using the sample of 253 volunteer adults from the Biosignal Laboratory 
panel (the Invalidity scores were not computed for this sample). 


Scale Mean l Standard Deviation 
Insecurity 3.5 (4.70) 3.0 (3.42) — 
Conscientious 4.8 (5.12) 3.2 (3.34) 
Introversion 6.3 (7.06) 3.2 (3.43) 
Tender-Minded 6.6 (6.68) 2.8 (2.88) 
Unconventional 5.0 (5.53) 3.1 (3.44) 


Invalidity V scale 3f 2.4 
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Scale means and standard deviations 


Kline and Lapham (1991b and 19922) report the global means and standard deviati- 
ons for each of the five PPQ personality scales, as well as for the Invalidity V scale. 
Table 4 lists these. Also in this table are the personality scale means and standard 
deviations computed using the Biosignal Laboratory sample of 253 volunteer adults. 
Kline and Lapham (1991b). using the N = 1,472 student sample, also provide a 
comprehensive set of tables of personality scale means and standard deviations as a 
function of the frequency distribution of the Invalidity Scores. These tables are use- 
ful for determining a threshold for "inconsistent responding" on each of the scales. 

Furthermore, within Kline and Lapham (19924) is a breakdown of the global stu- 
dent sample into constituent samples of Arts, Sciences, Social Sciences, Engineers, 
and Mixed groups. In Kline and Lapham (1992b), a new sample of data from 208 
employees across 10 occupational groups in various organisations was obtained. The 
means and standard deviations of the five PPQ scales are reported in that paper, al- 
ong with indications from analyses of variance and post-hoc Scheffe contrast tests 
that certain occupational groups can be meaningfully differentiated using particular 
scale scores. 


Construct/concurrent validity 


The final set of data presented in Kline and Lapham (19913) establish some limited 
construct validity for the PPQ scales. The test authors concurrently administered the 
Eysenck Personality Questionnaire (EPQ: Eysenck & Eysenck, 1975) to 100 mixed 
gender UK university students (56 male, 44 female). The four EPQ scales of Psy- 
choticism, Extraversion, Neuroticism and Social Desirability were subsequently 
correlated with the five PPQ scales. The results are presented in Table 5 below. 
These correlations indicate that the PPQ scales only marginally overlap with their 
nearest EPQ counterparts. For example, EPQ-Psychoticism correlates at just -.06 
with Tender-Minded (Tough-Minded being the opposite pole on the PPQ). As Kline 


Table 5. Correlations between the Eysenck Personality Questionnaire (EPQ) and PPQ personality 
scales, using the scores from 100 UK mixed gender university students 


EPQ 
Scale Psychoticism Extraversion Neuroticism Social Desirability 
PPQ-Insecurity -.14 -.45 37 m 
PPQ-Conscientious -.17 -.02 .08 ‚22 
PPQ-Introversion -.19 -.41 19 ‚28 
PPQ-Tender-Minded -.06 z15 M3 -.02 


PPQ-Unconventional .23 .30 -.20 -.18 
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Figure 1. Smallest Space Analysis of the 15FQ, EPQR, I7, MAB ability, and PPQ scales 


and Lapham indicate though, the PPQ Tough-Minded scale items contain no refe- 
rence to cruel, bizarre, or aggressive behaviours, so this lack of correlation is to be 
expected. The strongest correlation is that between ЕРО Extraversion and PPQ In- 
troversion (-.41, corrected for unreliability = -.52, using the alpha of 0.85 from the N 
= 4,140 mixed gender UK reference EPQ sample). This is a moderate correlation 
that indicates that a significant amount of variance, over and above that in common 
with the EPQ, is being assessed by the PPQ. 

Another set of data, partially presented їп the Kline and Barrett (1994) paper, 
provides some additional concurrent validity for the PPQ. The questionnaire was 
administered to 253 Biosignal Laboratory panel volunteers (as detailed above). The- 
se respondents had also completed the Eysenck Personality Questionnaire — Revised 
(EPQR: Eysenck, Eysenck, & Barrett, 1985), the Fifteen Factor Personality Questi- 
onnaire (15FQ: Раше! & Budd, 1992), the Jackson Multidimensional Ability Battery 
(MAB: Jackson, 1984), and the IVE-I7 (Eysenck, Pearson, Easting, & Allsopp, 
1985) questionnaire. The 15FQ is a normative, three-option response format, perso- 
nality test that has been developed for use in research, industrial, and organisational 
settings. The test consists of 191 items, assessing 15 bipolar personality dimensions 
similar to those measured using Cattell's 16РЕ Form A. In addition, a 16th scale 
provides a measure of motivational distortion that is similar to the concept of social 
desirability as measured via the Eysenck Personality Questionnaire-Revised version 
(EPQR, Eysenck, Eysenck, & Barrett, 1985). The Eysenck er al. IVE-I7 question- 
naire assesses Impulsivity, Venturesome, and Empathy/Sensitivity. The MAB asses- 
ses ten of the subscales of the WAIS-R ability test in a group-administered and/or 
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Figure 2. Smallest Space Analysis of the 15FQ, EPQR, I7, and PPQ scales 
Table 6. Key to abbreviations used in Figures 1 and 2. 
Fifteen Factor Personality Questionnaire (15FQ) 
FA = Outgoing 
FC = Stable (scored as Instability) Eysenck IVE-17 
FE = Assertiveness I7-Imp = Impulsivity 
FF = Enthusiastic I7-Vent - Venturesomeness 
FG = Detail-Conscious 17-Етр = Empathy/Sensitivity 
FH = Socially Bold Professional Personality Questionnaire (PPQ) 
Fl = Aesthetic Sensitivity PPQ-Insec = Insecurity/Anxiety 
FL = Suspicious PPQ-Extr = Extraversion 
FM = Conceptual PPQ-Unconv = Unconventionality 
FN = Restrained (scored as Unrestrained) PPQ-Consc = Conscientiousness 
FO = Self-Doubting PPQ-Tend = Tender-Minded 
FQ1 = Radical MAB: Multidimensional Aptitude Battery 
FQ2 = Self-Sufficient (scored as Group-Oriented) Voc = Vocabulary 
FQ3 = Disciplined Sim = Similarities 
FQ4 = Tense-Driven Dig = Digit Symbol 
FMD = Motivational Distortion Com = Comprehension 
Eysenck Personality Questionnaire-Revised Sp = Spatial 
(EPQR) Pa = Picture Arrangement 
Psy = Psychoticism inf = Information 
Extr = Extraversion Ar = Arithmetic 
Neur = Neuroticism Pe = Picture Completion 


SocD= Social Desirability/Conformity Ob = Object Assembly 
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Table 7. Pearson correlations between the PPQ scales, the Psytech 15FQ Personality scales, the 
Eysenck 17 and EPQ-R Personality scales, and Jackson's Multidimensional Aptitude Battery (MAB) 


ability scales 
m 
PPQ i 
N - 193 mixed gender cases insecurity Tender-minded Extraversion Conscientious 


15FQ Outgoing -.37 .02 .26 - -.08 .13 
15FQ Stability -.32 -.18 .13 .03 .06 
15FQ Assertiveness -.39 -.31 .35 -.09 .26 
15FQ Enthusiastic -.41 -.10 ‚41 -.12 .31 
15FQ Detail Conscious .04 -.16 -.08 .42 -.08 
15FQ. Socially Bold -.38 -.06 .28 .07 21 
15FQ_Aesthetic Sensitivity .02 ‚43 Т2 .14 ET 
15FQ Suspicious ‚10 -.17 .06 .16 .04 
15FQ Conceptual 11 „17 -.05 -.15 .28 
15FQ Restrained .31 -.19 -.29 .26 -.29 
15FQ Self-Doubting .24 -.17 -.19 «jg -.10 
15FQ Radical 15 -.02 .03 -.28 ‚41 
15FQ Group-Oriented ‚22 ‚16 -.15 ‚06 ‚05 
15FQ Disciplined .01 -.14 .02 .38 -.29 
15FQ Tense-Driven 217) -.21 -.10 .14 .00 
15FQ Motivational Distortion -.09 .01 -.07 sili .05 
EPQR Psychoticism -,12 -.15 ‚31 -.31 ‚46 
EPQR Extraversion -.47 .08 ‚44 ‚03 .29 
EPQR Neuroticism 21 .17 -.12 .03 -.01 
EPQR Social Desirability ‚10 ‚04 -.24 .24 -.08 
I7 Venturesomeness -.20 -.12 .46 -.06 .25 
17 Impulsivity -.37 -.26 .38 -.17 „37 
I7 Empathy-Sensitivity 722 ‚42 -.22 ‚06 -.05 
MAB. Information -.07 .05 -.14 -.27 .09 
MAB_Comprehension -.07 15 -.09 -.22 .03 
MAB Arithmetic -.20 -.11 .02 -.30 -.02 
MAB Similarities -.04 .19 -.16 -.33 .01 
MAB Vocabulary -.03 .09 -.21 -.21 -.15 
MAB Digit Symbol -.19 205 -.04 -.24 -.07 
MAB Picture Completion -.13 -.10 -.06 -.15 -.11 
MAB. Spatial -.06 .05 -.08 -.22 .02 
MAB Picture Arrangement -.17 .06 .00 -.23 .02 
МАВ Object Assembly -.22 -.02 .08 -.25 .09 
MAB Verbal IQ -.11 .09 -.14 -.33 -.01 
MAB. Performance IQ -.19 .00 -.02 -.27 .00 
МАВ Full-Scale IQ. -.16 ‚05 -.09 -.33 ‚00 


*Note: correlations |.25| and above are highlighted 


individual computer-administered format, excluding the Digit Span subtest. As with 
the WAIS-R, it provides estimates of Verbal, Performance, and Full-Scale IQ. The 
total sample size for this concurrent study is 193 mixed-gender respondents (48 ma- 
les, 145 females). Rather than provide the entire correlation matrix for the 28 scales, 
a useful way of demonstrating concurrent validity is to compute a multidimensional 
scaling solution for the scale score data matrix. Figures | and 2 provide the scaling 
maps, with and without the MAB ability variables. The key to each figure provides 
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the full name of each scale abbreviation, and the regional area names represented by 
the enclosed groups of scales. 

As can be seen from Figure 1, the PPQ and other personality variables show no 
substantive relationship to the ability variables. The correlations reported in Table 7 
between these variables and the 10 ability subset variables confirms this (although 
Conscientiousness is negatively related with an average correlation of -.26). In figu- 
re 2, it can be seen that the PPQ scales (as in figure 1) are closely associated with the 
key personality regions of the map. What is also noticeable is that in these data, PPQ 
Insecurity and PPQ Tender-Minded are placed fairly close together in the Euclidean 
personality space; their Pearson intercorrelation in these data is .28. Further, instead 
of five regions being associated with the Big Five "factors", only four have been 
identified using these questionnaires. 

As an additional aid to understanding the relationships between the PPQ factors 
and the variables, all the correlations between the personality, ability, and PPQ scale 
Scores are presented in Table 7. 

The correlations do not exceed .50 between any of the PPQ scales and other re- 
lated personality scales. Further, there is no substantive correlation between any of 
the ability scales and PPQ scales except for the Conscientious scale. These latter 
correlations were all negative, with the highest observed value of -.33 between Full- 
Scale and Verbal IQ. and the Similarities subtest. The complete scale score dataset is 
available from the author in SPSS and STATISTICA format. 

In conclusion, the PPQ possesses the minimal psychometric properties required 
by a questionnaire for use as a measure of variants of the Big Five factors of McCrae 
and Costa (1987). It possesses adequate factorial validity, and internally consistent 
scales. It is relatively short, face-valid for the ИО work environment, and quick to 
complete. There is a limited amount of construct/concurrent validity with some other 
multi-scale personality tests and a multidimensional ability test. However, given the 
lack of substantive normative non-student data, it is probably best considered a “те- 
search-only" questionnaire until more norms, further factor analysis, and test-retest 
reliability indices are obtained. 
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Appendix A: The 68 item PPQ including administration instructions 


Instructions 


in this test there are a number of statements which complete the unfinished sentence “Ide- 
ally, I would like to work in a job setting where ..." You should respond to each statement 
by circulating Yes, if you agreed with the statement or No, if you disagree. 


For example: Ideally I would like to work in a job setting where ... The workday is 10am – 
6pm as opposed to 9am – 5pm. 


If your ideal job-setting is one in which the work day is 10am – брт then circle Yes. If you 
prefer 9am — 5pm then circle No. Remember this is not necessarily the way it is where you 
are working now or where you have worked before. It is the way things would be if every- 
thing was exactly the way you wanted it: your IDEAL job setting. 


The following points should be remembered when completing the questionnaire. 


There are no right or wrong answers - be as honest as you can and do not give an answer 
because it seems the right thing do say. 


If you want to change an answer delete it completely and then circle your new response. 


There is no time limit — however you should work as quickly as you can without pondering 
over any one question at length. 


Do not worry about being consistent — answer each question individually. 


There are 68 questions in this questionnaire — please ensure that you complete ALL the 
questions. 


Тит over and begin the test when you are told. 
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Ideally I would like to work in a job setting where ... 


It is easier to get ahead by plodding on deliberately than by taking 
chances. 

More liberal methods are favoured over traditional ones. 

People with the necessary abilities are taken on even if they are not 
punctual. 

People believe they should be effective first, supportive second. 

The atmosphere is calm and steady as opposed to fast and pressured. 
It's important to everyone that shelves are dusted and floors vacuumed 
and/or dusted every week. 

It's generally accepted that to get ahead you have to break a few rules. 
I am often the centre of attention. 

Attending to minute detail is not considered the only way to do an ac- 
ceptable job. 

The job involves more high profile activity than activity taking place 
behind the scenes. 

The work requires me to be more empathic than logical. 

Moderation, discipline and self-control are three of the emphasised 
values at work. 

It if had to be one of the other I would be considered well-liked as op- 
posed to the best in my field. 

Employees are expected to maintain a particularly high standard of 
order and tidiness in their own personal workspace. 

Unexpected situations, both good and bad, often occur. 

Colleagues would describe me as more tough than sensitive. 

Most of the work involves long term projects requiring a steady pace 
as opposed to short term projects requiring rapid action. 

The impression that I make on people is not that important to my suc- 
cess. 

I am rarely called upon to inspire confidence in others. 

I am often making decisions that are crucial to the company. 

Deadlines are rarely set — they are seen as limiting. 

Quick decision making is favoured over taking time to contemplate 
issues. 

Employees avoid pushing ideas that require stepping on a few toes. 
Competitive people get ahead most quickly. 

The methods I am dealing with could be described as more novel than 
established. 

I often am expected to take and/or offer advice. 

The work requires me to be more imaginative than pragmatic. 
Employees are more concerned with expression individuality than 
identifying with one another. 

The atmosphere is fast and pressured as opposed to calm and steady. 
Waste not want not is one of the main rules emphasised at work. 


Yes 


99. 
60. 
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I am directly answerable to someone who offers advice and keeps track 
of my progress as opposed to being a free agent. 

It’s the employee's prerogative whether to keep his/her personal work- 
space tidy or untidy. 

Employees could be described as more creative than practical . 

I am frequently in the position where 1 am asked for my opinion. 

It's mandatory that employees check general files and/or supplies 
every week to make sure that they're arranged categorically. 

If a choice had to be made. I would be putting ideas into action rather 
than being the one who comes up with them. 

Colleagues would describe me as more firm than compliant. 

I am hardly ever in the position where I have to advance my own views 
and challenge other people's. 

If a choice had to be made. Га associate more frequently with powerful 
people as opposed to people I'm close to who have little influence. 
Colleagues would describe me as more sensitive than tough. 

The hierarchy is strictly defined – Fm expected to treat superiors with 
greater respect than I would colleagues and people in lower status po- 
sitions are expected to do the same for me. 

I rarely am making decisions that are crucial to the success of the com- 
pany. 

If a choice had to be made, Га have high job security with a mediocre 
income rather than low job security with a higher income. 

It's not of much concern if shelves don't get dusted or floor don't get 
vacuumed and/or mopped every week. 

The work requires me to be more subjective than objective. 

I am required to present my ideas face-to-face more often than in wri- 
ting. 

If a choice had to be made. Га be developing new ways of doing things 
as opposed to improving standard methods. 

Unexpected situations, good or bad, rarely occur. 

The general approach to work could be described as more conventional 
than progressive. 

Taking time to contemplate issues is favoured over quick decision ma- 
king. 

I rarely am expected to take advice and /or offer it. 

Most of my work involved independent as opposed to group projects. 
At times the atmosphere is hectic and rushed. 

The work requires me to be more demanding than consenting. 

Most of the work involves short-term projects requiring rapid action as 
opposed to long-term projects requiring a steady pace. 

The methods I am dealing with could be described as more established 
than novel. 

I am often called upon to inspire confidence in others. 

A place for everything and everything in its place is a rule that everyo- 
ne must follow. 

There is a standard of dress that people are expected to follow. 

I am a free agent as opposed to directly answerable to someone who 


Yies 
Yes 
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No 
No 
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61. 


62. 
93. 
64. 
65. 
66. 


67. 


68. 
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offers advice and keeps track of my progress. 

The work requires me to spend most of my time travelling around each 
day rather than remaining on site. 

Ican maintain a daily routine that is rarely broken. 

Lateness is intolerable. 

People believe they should be supportive first, effective second. 

The work requires me to be more logical than empathic. 

I rarely go into work knowing exactly what ГІІ be doing every hour; I 
just have a general idea and take things as they come. 

I am in a lower status position where mistakes have little impact as 
opposed to a higher status position where mistakes can have serious 
consequences. 

The work requires me to be more pragmatic than imaginative. 


Yes 


Yes 
Yes 
Yes 
Yes 
Yes 


Yes 


Yes 
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Appendix B: The PPQ ScoreKey 


Insecurity vs Confidence (Anxiety) – 15 items 

Score 1 if response is Yes: 18, 19, 23, 38, 42, 48, 51, 62, 67 
Score 1 if response is No : 15, 20, 26, 34, 46, 57 

А high score on the scale indicates Insecurity 


Conscientiousness vs Careless — 13 items 

Score 1 if response is Yes: 6, 12, 14, 30, 35, 41, 58, 59, 63 
Score 1 if response is No : 3, 9, 32, 44 

A high score on the scale indicates Conscientiousness 


Introversion-Extraversion — 13 items 

Score 1 if response is Yes: 1, 5, 17, 43, 50 

Score 1 if response is No : 7, 8, 10, 22, 24, 29, 53, 55 
A high score on the scale indicates Introversion 


Tough vs Tender-Minded (Agreeableness) — 12 items 
Score 1 if response is Yes: 11, 13, 40, 45, 64 

Score 1 if response is No : 4, 16, 37, 39, 52, 54, 65 

A high score on the scale indicates Tender-Minded 


Conventional vs Unconventional (Openness to Experience) — 15 items 
Score | if response is Yes: 31, 36, 49, 56, 68 

Score 1 if response is No : 2, 21, 25, 27, 28, 33, 47, 60, 61, 66 

A high score on the scale indicates Conventionality 


Invalidity V scale — 15 item pairs 

If both items in each pair are checked the same (a "Yes" or "No" on each item), then the 
respondent achieves a score of 1 on the Invalidity scale, or О otherwise. For example, if a 
respondent responds “Yes” on question 4, and "Yes" on question 64, then they would achie- 
ve a 1 on the Invalidity scale as the items are directly reversed in meaning. 

A high score on the scale indicates invalid/inconsistent responding 


Score 1 if both responses are the same for these item pairs: 


4, 64 19, 57 
5. 29 20, 42 
6, 44 22, 50 
11, 65 22, 56 
14, 32 26, 51 
15, 48 27, 68 
16, 40 31, 60 


17953 
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Appendix C: PPQ SCORE DESCRIPTIONS 


Raw Score 0 - 3 Low Extraversion Std. Score 0 - 2 
Your score on extraversion was lower than average. This means that you are the sort of per- 
son who is somewhat quiet and retiring and prefers to have a few real friends rather than а 
large number of superficial acquaintances. This score suggests that you would be happy in 
jobs where you had to be on your own quite a bit of the time and did not meet new people 
all the time. Quiet jobs rather than bustling environments are what you prefer. 


Raw Score 4-8 Average Extraversion Std. Score 3 - 7 
Your score on extraversion was around the average for your group. This means that you can 
get on with people but are not particularly sociable or gregarious. On the other hand, you are 
not withdrawn or shy, or one who likes being on his or her own particularly. This score me- 
ans that a wide variety of job settings would be quite suitable for you, although ones where 
you were much of the time on your own or ones where you had to meet new people every 
day would not be ideal. 


9 + High Extraversion Std. Score 8 + 

Your score on extraversion was higher than average. This means that you are an outgoing, 
sociable person who enjoys meeting and gets on well with other people. You are at your 
best in working situations where it is necessary to get along with a variety of people and 
where you do not have a lot of repetitive and careful work, or where you have to be on your 
own for long periods of time. 


0-1 Low Confidence Std. Score 0 - 2 
Your score was below average for confidence. This has no clinical implications because our 
test was specifically designed for normal people who are not likely to require psychiatric 
care. Your score means that generally you do not feel particularly confident in difficult or 
strange situations, or situations where you are highly responsible for other people. This me- 
ans that you feel happier in jobs where there is a good routine and where people know what 
is expected of them, even when this may be very difficult. Many professional positions are 
of this kind. 


2-5 Average Confidence Std. Score 3 - 6 
Your score was around the average for confidence for your group. 

This means that you are well capable of dealing with all the normal stresses and strains of 
life. It means that jobs which require you to act quickly and take responsibility will not be 
too irksome for you and are unlikely to cause you sleepless nights. 


6 4 High Confidence Std. Score 7 + 
It is better not to have this category. The meaning of such a score is not clear and feedback 
of this kind could be dangerous. It is better to combine the category with the one above. 
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0-4 Low Tough-mindedness Std. Score 0 - 3 
Your score on tough-mindedness was below average. This means that you are a somewhat 
sympathetic person who does not like to make decisions that bear adversely upon the lives 
and feelings of others. This sensitivity to feelings means that you would be suited to jobs 
where this quality is regarded as necessary. The helping professions and personnel mana- 
gement are obvious possibilities. 


5-8 Average Tough-mindedness Std. $соге 4 - 6 
Your score on tough-mindedness was about average for your group. This means that you 
can take decisions which affect people adversely and hurt their feelings but you do not enjoy 
it and would rather not have to. This means that you would be suited to jobs where decisions 
about the fates and careers of people had to be made every once in a while but was not a 
major part of the work. 

9+ High Tough-mindedness Std. Score 7 4 
Your score was above average for tough-mindedness. This means that you do not object to 
taking decisions even where you may have to ruffle peoples' feelings or pride. You are able 
to face unpopularity if it is necessary and some people might regard you as ruthless. This 
means that you would be happy in jobs where it was necessary to be tough, as in some as- 
pects of management and industry. You would be unlikely to be happy in jobs where it was 
important to be sympathetic to people or to help those who can't help themselves. 


0-2 Low Openness to Experience Std. Score 0 - 3 
Your score on the measure of openness to experience was below average. This means that 
you are happiest working with tried and tested methods, where you can use approaches to 
problems that you know will work. You are suspicious of new things until they have been 
demonstrated to be effective. This means that you are best in jobs where you can use good, 
sound methods. Many of the professions are of this kind. 


3-6 Average Openness to Experience Std. Score 4 - 6 

Your are around the average on our measure of openness to experience. This implies that 
you are capable of adapting to and using new methods but are also quite happy to use old 
ones, provided that they are efficient. This means that you are quite adaptable for most jobs 
but generally prefer those where there is an emphasis neither on completely new approaches 
nor on the rigid application of traditional methods. 


7+ High Openness to Experience Std. Score 7 4 
You are above average on the measure of openness to experience. This means that you en- 
joy novel methods and new approaches to thinking about things. You prefer trying new 
things out even when there are accepted ways of doing them. You would happy in jobs that 
exploited these characteristics, where you were allowed to do things your way and be gene- 
rally creative. 


474 Big Five Assessment 


0-2 Low Conscientiousness Std. Score 0 - 2 

You scored below average on the conscientiousness scale This means that you are not too 
concerned with the trivial details of how things are done but are more interested in the end 
result. Rules and regulations are only important where there is no other way and you see the 
pettiness of rule bound officials as just an excuse for not thinking. You would be happiest in 
jobs where you could do things in your own way and in your own time. You would hate 
regimented positions such as the armed forces, or jobs where you always had to wear a suit 


3-7 Average Conscientiousness Std. Score 3 - 6 

You obtained an average score for conscientiousness. This means that you are conscientious 
and pay due regard to rules and regulations. You do things properly but are not heavy- 
handed about it and are not completely upset if you or a colleague are a tiny bit late or forget 
something. This means that you would fit well into most jobs except those where there was 
excessive emphasis on rules and regulations and the opposite of this where there was com- 
plete laxity. 


84 High Conscientiousness ^. Std. Score 7 + 
You are above average on conscientiousness. This means that you are a person with very 
high standards who likes everything to be done properly. You believe that old-fashioned 
virtues like punctuality, neatness and regard for rules and regulations are important. You 
prefer jobs where these qualities can come into play and are rewarded. Some accounting, 
legal and administrative positions are of this kind, as are jobs in the armed forces and police. 
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